2 research outputs found
Multiset Synchronization with Counting Cuckoo Filters
Set synchronization is a fundamental task in distributed applications and
implementations. Existing methods that synchronize simple sets are mainly based
on compact data structures such as Bloom filter and its variants. However,
these methods are infeasible to synchronize a pair of multisets which allow an
element to appear for multiple times. To this end, in this paper, we propose to
leverage the counting cuckoo filter (CCF), a novel variant of cuckoo filter, to
represent and thereafter synchronize a pair of multisets. The cuckoo filter
(CF) is a minimized hash table that uses cuckoo hashing to resolve collisions.
CF has an array of buckets, each of which has multiple slots to store element
fingerprints. Based on CF, CCF extends each slot as two fields, the fingerprint
field and the counter field. The fingerprint field records the fingerprint of
element which is stored by this slot; while the counter field counts the
multiplicity of the stored element. With such a design, CCF is competent to
represent any multiset. After generating and exchanging the respective CCFs
which represent the local multi-sets, we propose the query-based and the
decoding-based methods to identify the different elements between the given
multisets. The comprehensive evaluation results indicate that CCF outperforms
the counting Bloom filter (CBF) when they are used to synchronize multisets, in
terms of both synchronization accuracy and the space-efficiency, at the cost of
a little higher time-consumption
Optimizing Bloom Filter: Challenges, Solutions, and Comparisons
Bloom filter (BF) has been widely used to support membership query, i.e., to
judge whether a given element x is a member of a given set S or not. Recent
years have seen a flourish design explosion of BF due to its characteristic of
space-efficiency and the functionality of constant-time membership query. The
existing reviews or surveys mainly focus on the applications of BF, but fall
short in covering the current trends, thereby lacking intrinsic understanding
of their design philosophy. To this end, this survey provides an overview of BF
and its variants, with an emphasis on the optimization techniques. Basically,
we survey the existing variants from two dimensions, i.e., performance and
generalization. To improve the performance, dozens of variants devote
themselves to reducing the false positives and implementation costs. Besides,
tens of variants generalize the BF framework in more scenarios by diversifying
the input sets and enriching the output functionalities. To summarize the
existing efforts, we conduct an in-depth study of the existing literature on BF
optimization, covering more than 60 variants. We unearth the design philosophy
of these variants and elaborate how the employed optimization techniques
improve BF. Furthermore, comprehensive analysis and qualitative comparison are
conducted from the perspectives of BF components. Lastly, we highlight the
future trends of designing BFs. This is, to the best of our knowledge, the
first survey that accomplishes such goals