WiredTiger源码阅读（一）：Bloom过滤器

30 Dec, 2022

（我发现用英语表述算法更加自然，因此我尝试用英语写这篇博客。）

Overview

Git commit: 5ec6215c

File: src/bloom/bloom.c

bloom.c symbols

Now let’s take a deep dive!

Lifetime management

Creation

Let’s look into __wt_bloom_create at our first attempt.

/*
 * __wt_bloom_create --
 *     Creates and configures a WT_BLOOM handle, allocates a bitstring in memory to use while
 *     populating the bloom filter. count - is the expected number of inserted items factor - is the
 *     number of bits to use per inserted item k - is the number of hash values to set or test per
 *     item
 */
int
__wt_bloom_create(WT_SESSION_IMPL *session, const char *uri, const char *config, uint64_t count, uint32_t factor, uint32_t k, WT_BLOOM **bloomp) WT_GCC_FUNC_ATTRIBUTE((visibility("default")))
{
    WT_BLOOM *bloom;
    WT_DECL_RET;

    WT_RET(__bloom_init(session, uri, config, &bloom));
    WT_ERR(__bloom_setup(bloom, count, 0, factor, k));

    WT_ERR(__bit_alloc(session, bloom->m, &bloom->bitstring));

    *bloomp = bloom;
    return (0);

err:
    WT_TRET(__wt_bloom_close(bloom));
    return (ret);
}

(src/bloom/bloom.c:80,105)

This function is rather trivial and simple. So it’s very suitable for us to start with.

WT_BLOOM is an alias of struct __wt_bloom, which is defined in src/include/bloom.h as below:

struct __wt_bloom {
    const char *uri;
    char *config;
    uint8_t *bitstring; /* For in memory representation. */
    WT_SESSION_IMPL *session;
    WT_CURSOR *c;

    uint32_t k;      /* The number of hash functions used. */
    uint32_t factor; /* The number of bits per item inserted. */
    uint64_t m;      /* The number of slots in the bit string. */
    uint64_t n;      /* The number of items to be inserted. */
};

WT_DECL_RET, WT_RET, WT_ERR is a bunch of C macros that helps reduce boilerplate error-handling code. Let’s look into the definitions since we’ve never seen them before.

k, factor, m, and n are algoritim related numbers.

`WT_DECL_RET`

/* Function return value and scratch buffer declaration and initialization. */
#define WT_DECL_ITEM(i) WT_ITEM *i = NULL
#define WT_DECL_RET int ret = 0

This is a return value helper macro.

WiredTiger has a well-documented code base. Its comment has described the macro clearly.

`WT_RET`

/* Return tests. */
#define WT_RET(a)               \
    do {                        \
        int __ret;              \
        if ((__ret = (a)) != 0) \
            return (__ret);     \
    } while (0)
#define WT_RET_TRACK(a)               \
    do {                              \
        int __ret;                    \
        if ((__ret = (a)) != 0) {     \
            WT_TRACK_OP_END(session); \
            return (__ret);           \
        }                             \
    } while (0)
#define WT_RET_MSG(session, v, ...)            \
    do {                                       \
        int __ret = (v);                       \
        __wt_err(session, __ret, __VA_ARGS__); \
        return (__ret);                        \
    } while (0)
#define WT_RET_TEST(a, v) \
    do {                  \
        if (a)            \
            return (v);   \
    } while (0)
#define WT_RET_ERROR_OK(a, e)                           \
    do {                                                \
        int __ret = (a);                                \
        WT_RET_TEST(__ret != 0 && __ret != (e), __ret); \
    } while (0)

/* ... */

These are return helpers. By using these macros, boilerplate code that does ordinary error check could be simplified to single macro call.

Go is designed by C experts, so it’s very familiar with C in many aspects.

Some Gophers (i.e. Go enthusiasts) state that Go’s if err != nil { return err } idiom is good enough, since C also uses similar things to handle errors.

However, here we can observe a major difference between C and Go: macro. C preprocessor supports macros, while Go’s doesn’t.

`WT_ERR`

/* Set "ret" and branch-to-err-label tests. */
#define WT_ERR(a)             \
    do {                      \
        if ((ret = (a)) != 0) \
            goto err;         \
    } while (0)
#define WT_ERR_MSG(session, v, ...)          \
    do {                                     \
        ret = (v);                           \
        __wt_err(session, ret, __VA_ARGS__); \
        goto err;                            \
    } while (0)
#define WT_ERR_TEST(a, v, keep) \
    do {                        \
        if (a) {                \
            ret = (v);          \
            goto err;           \
        } else if (!(keep))     \
            ret = 0;            \
    } while (0)
#define WT_ERR_ERROR_OK(a, e, keep) WT_ERR_TEST((ret = (a)) != 0 && ret != (e), ret, keep)
#define WT_ERR_NOTFOUND_OK(a, keep) WT_ERR_ERROR_OK(a, WT_NOTFOUND, keep)

C is a simple language. Its simplicity ensures that its complexity is the internal of the runtime: platform details. I think C succeeds because of its ultimate flexibility.

Bloom filtering

The goal of a Bloom Filter is to filter out some elements that are not present in the set. While some items that are not presented in the target set haven’t been filter out, a Bloom Filter will reduces the time spent on looking up the set, because a Bloom Filter runs in constant worst time boundary. In another word, a bloom filter checks whether an given element is contained by the set, with possible false positive cases.

Data structure

A Bloom Filter is a n-bit bitmap. It marks the presence of each element in $k$ uniformly distributed bits. To add an element, simply set these $k$ bits to 1. To query an element, simply check if all these $k$ bits are 1. As some collisions may happen, all $k$ bits of a random element may be occasionally set to 1. From this observation, we can conclude the key property of Bloom Filter:

A Bloom Filter checks if an element is present in a set.
Some elements that are not present may be incorrectly reported as present.
Elements that are reported as not present, are definitely not present.

Insertion

A Bloom Filter adds an element in following steps:

Hash the element, obtaining hashes $h_1, h_2, …, h_k$
For all $n = 1, 2, …, k$, set $Bitmap[n] = 1$

Now let’s see the code:

/*
 * __wt_bloom_insert --
 *     Adds the given key to the Bloom filter.
 */
void
__wt_bloom_insert(WT_BLOOM *bloom, WT_ITEM *key) WT_GCC_FUNC_ATTRIBUTE((visibility("default")))
{
    uint64_t h1, h2;
    uint32_t i;

    h1 = __wt_hash_fnv64(key->data, key->size);
    h2 = __wt_hash_city64(key->data, key->size);
    for (i = 0; i < bloom->k; i++, h1 += h2)
        __bit_set(bloom->bitstring, h1 % bloom->m);
}

According to Wikipedia’s description on Bloom Filter algorithm, the data structure uses $k$ different hash functions. It’s clear that the 64-bit hash function used in WiredTiger Bloom Filter is composed with FNV64-1a (Fowler–Noll–Vo) and CityHash64 as follows:

$H_n(x) = FNV1a64(x) + n \times CityHash64(x), n = 0, 1, 2, …, k$

Now we realise that the code is a direct translation of the algorithm. In fact, these hash function calls are some hand-written inline of function void __wt_bloom_hash(WT_BLOOM *bloom, WT_ITEM *key, WT_BLOOM_HASH *bhash), which is used in query operation.

/*
 * __wt_bloom_hash --
 *     Calculate the hash values for a given key.
 */
void
__wt_bloom_hash(WT_BLOOM *bloom, WT_ITEM *key, WT_BLOOM_HASH *bhash)
{
    WT_UNUSED(bloom);

    bhash->h1 = __wt_hash_fnv64(key->data, key->size);
    bhash->h2 = __wt_hash_city64(key->data, key->size);
}

Query

To query the presence of an element, do the following steps:

Hash the element, obtaining hashes $h_1, h_2, …, h_k$
For $n = 1, 2, …, k$, if all $Bitmap[n]$ are $1$, report it as may present. Otherwise report must not present.

WiredTiger’s Bloom Filter supports both volatile (in-memory) and persistent (on-disk) storage. Let’s look at them separately.

Query the memory

The in-memory version is simple as follows:

/*
 * __wt_bloom_inmem_get --
 *     Tests whether the given key is in the Bloom filter. This can be used in place of
 *     __wt_bloom_get for Bloom filters that are memory only.
 */
int
__wt_bloom_inmem_get(WT_BLOOM *bloom, WT_ITEM *key)
{
    uint64_t h1, h2;
    uint32_t i;

    h1 = __wt_hash_fnv64(key->data, key->size);
    h2 = __wt_hash_city64(key->data, key->size);
    for (i = 0; i < bloom->k; i++, h1 += h2) {
        if (!__bit_test(bloom->bitstring, h1 % bloom->m))
            return (WT_NOTFOUND);
    }
    return (0);
}

/*
 * __bit_test --
 *	Test one bit in name.
 */
static inline bool
__bit_test(uint8_t *bitf, uint64_t bit)
{
	return ((bitf[__bit_byte(bit)] & __bit_mask(bit)) != 0);
}

				/* byte of the bitstring bit is in */
#define	__bit_byte(bit)	((bit) >> 3)

				/* mask for the bit within its byte */
#define	__bit_mask(bit)	(1 << ((bit) & 0x7))

				/* Bytes in a bitstring of nbits */
#define	__bitstr_size(nbits) (((nbits) + 7) >> 3)

Query the disk

Compared to the previous in-memory version, the on-disk version is bloated with housekeeping code.

/*
 * __wt_bloom_get --
 *     Tests whether the given key is in the Bloom filter. Returns zero if found, WT_NOTFOUND if
 *     not.
 */
int
__wt_bloom_get(WT_BLOOM *bloom, WT_ITEM *key) WT_GCC_FUNC_ATTRIBUTE((visibility("default")))
{
    WT_BLOOM_HASH bhash;

    __wt_bloom_hash(bloom, key, &bhash);
    return (__wt_bloom_hash_get(bloom, &bhash));
}

/*
 * __wt_bloom_hash_get --
 *     Tests whether the key (as given by its hash signature) is in the Bloom filter. Returns zero
 *     if found, WT_NOTFOUND if not.
 */
int
__wt_bloom_hash_get(WT_BLOOM *bloom, WT_BLOOM_HASH *bhash)
{
    WT_CURSOR *c;
    WT_DECL_RET;
    uint64_t h1, h2;
    uint32_t i;
    uint8_t bit;
    int result;

    /* Get operations are only supported by finalized bloom filters. */
    WT_ASSERT(bloom->session, bloom->bitstring == NULL);

    /* Create a cursor on the first time through. */
    c = NULL;
    WT_ERR(__bloom_open_cursor(bloom, NULL));
    c = bloom->c;

    h1 = bhash->h1;
    h2 = bhash->h2;

    result = 0;
    for (i = 0; i < bloom->k; i++, h1 += h2) {
        /*
         * Add 1 to the hash because WiredTiger tables are 1 based and the original bitstring array
         * was 0 based.
         */
        c->set_key(c, (h1 % bloom->m) + 1);
        WT_ERR(c->search(c));
        WT_ERR(c->get_value(c, &bit));

        if (bit == 0) {
            result = WT_NOTFOUND;
            break;
        }
    }
    WT_ERR(c->reset(c));
    return (result);

err:
    if (c != NULL)
        WT_TRET(c->reset(c));

    /*
     * Error handling from this function is complex. A search in the backing bit field should never
     * return WT_NOTFOUND - so translate that into a different error code and report an error. If we
     * got a WT_ROLLBACK it may be because there is a lot of cache pressure and the transaction is
     * being killed - don't report an error message in that case.
     */
    if (ret == WT_ROLLBACK || ret == WT_CACHE_FULL)
        return (ret);
    WT_RET_MSG(
      bloom->session, ret == WT_NOTFOUND ? WT_ERROR : ret, "Failed lookup in bloom filter");
}

Deletion

Due to the nature of OR operation used in the bit fields, there is no way to remove an element from Bloom Filter without recording element numbers in each bit field. WiredTiger’s Bloom Filter is bit based and does not provide deletion operation.

Merging

As Bloom Filter reports whether an element is present with possible errors, it can be seen as an ordinary set (with possible errors).

To calculate the union of two Bloom Filters, perform bitwise OR on their bitmaps. The resulting bitmap is an exact merge of the two original filters, i.e. the resulting set is exactly the same with the one created with members of the original sets.