19 research outputs found

    Adaptive cuckoo filters

    Get PDF
    We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false-positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false-positive rate by presenting both a theoretical model for the false-positive rate and simulations using both synthetic data sets and real packet trace

    Adaptive one memory access bloom filters

    Get PDF
    Bloom filters are widely used to perform fast approximate membership checking in networking applications. The main limitation of Bloom filters is that they suffer from false positives that can only be reduced by using more memory. We suggest to take advantage of a common repetition in the identity of queried elements to adapt Bloom filters for avoiding false positives for elements that repeat upon queries. In this paper, one memory access Bloom filters are used to design an adaptation scheme that can effectively remove false positives while completing all queries in a single memory access. The proposed filters are well suited for scenarios on which the number of memory bits per element is low and thus complement existing adaptive cuckoo filters that are not efficient in that case. The evaluation results using packet traces show that the proposed adaptive Bloom filters can significantly reduce the false positive rate in networking applications with the single memory access. In particular, when using as few as four bits per element, false positive rates below 5% are achieved.This work was supported by the ACHILLES project PID2019-104207RB-I00 and the Go2Edge network RED2018-102585-T funded by the Spanish Agencia Estatal de Investigación (AEI) 10.13039/501100011033 and by the Madrid Community research project TAPIR-CM grant no. P2018/TCS-4496

    Towards Application of Cuckoo Filters in Network Security Monitoring

    Get PDF
    In this paper, we study the feasibility of applying the recently proposed cuckoo filters to improve space efficiency for set membership testing in Network Security Monitoring, focusing on the example of Threat Intelligence matching. We present conceptual insights for the practical application of cuckoo filters and provide a cuckoo filter implementation that allows runtime configuration. To evaluate the practical applicability of cuckoo filters, we integrate our implementation into the Bro Network Security Monitor, compare it to traditional data structures and conduct a brief operational evaluation. We find that cuckoo filters allow remarkable memory savings, while potential performance trade-offs, caused by introducing false positives, have to be carefully evaluated on a case-by-case basis

    Concurrent Expandable AMQs on the Basis of Quotient Filters

    Get PDF
    A quotient filter is a cache efficient Approximate Membership Query (AMQ) data structure. Depending on the fill degree of the filter most insertions and queries only need to access one or two consecutive cache lines. This makes quotient filters very fast compared to the more commonly used Bloom filters that incur multiple independent memory accesses depending on the false positive rate. However, concurrent Bloom filters are easy to implement and can be implemented lock-free while concurrent quotient filters are not as simple. Usually concurrent quotient filters work by using an external array of locks - each protecting a region of the table. Accessing this array incurs one additional memory access per operation. We propose a new locking scheme that has no memory overhead. Using this new locking scheme we achieve 1.6× times higher insertion performance and over 2.1× higher query performance than with the common external locking scheme. Another advantage of quotient filters over Bloom filters is that a quotient filter can change its capacity when it is becoming full. We implement this growing technique for our concurrent quotient filters and adapt it in a way that allows unbounded growing while keeping a bounded false positive rate. We call the resulting data structure a fully expandable quotient filter. Its design is similar to scalable Bloom filters, but we exploit some concepts inherent to quotient filters to improve the space efficiency and the query speed. Additionally, we propose several quotient filter variants that are aimed to reduce the number of status bits (2-status-bit variant) or to simplify concurrent implementations (linear probing quotient filter). The linear probing quotient filter even leads to a lock-free concurrent filter implementation. This is especially interesting, since we show that any lock-free implementation of other common quotient filter variants would incur significant overheads in the form of additional data fields or multiple passes over the accessed data. The code produced as part of this submission can be found at https://www.github.com/Toobiased/lpqfilter

    Telescoping Filter: A Practical Adaptive Filter

    Get PDF
    Filters are small, fast, and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, filtering out queries x ? S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positive probability of ?. Correctness is traded for space: filters only use log (1/?) + O(1) bits per element. The false-positive guarantees of most filters, however, hold only for a single query. In particular, if x is a false positive, a subsequent query to x is a false positive with probability 1, not ?. With this in mind, recent work has introduced the notion of an adaptive filter. A filter is adaptive if each query is a false positive with probability ?, regardless of answers to previous queries. This requires "fixing" false positives as they occur. Adaptive filters not only provide strong false positive guarantees in adversarial environments but also improve query performance on practical workloads by eliminating repeated false positives. Existing work on adaptive filters falls into two categories. On the one hand, there are practical filters, based on the cuckoo filter, that attempt to fix false positives heuristically without meeting the adaptivity guarantee. On the other hand, the broom filter is a very complex adaptive filter that meets the optimal theoretical bounds. In this paper, we bridge this gap by designing the telescoping adaptive filter (TAF), a practical, provably adaptive filter. We provide theoretical false-positive and space guarantees for our filter, along with empirical results where we compare its performance against state-of-the-art filters. We also implement the broom filter and compare it to the TAF. Our experiments show that theoretical adaptivity can lead to improved false-positive performance on practical inputs, and can be achieved while maintaining throughput that is similar to non-adaptive filters

    Bloom Filters, Adaptivity, and the Dictionary Problem

    Get PDF
    The Bloom filter---or, more generally, an approximate membership query data structure (AMQ)---maintains a compact, probabilistic representation of a set S of keys from a universe U. An AMQ supports lookups, inserts, and (for some AMQs) deletes. A query for an x in S is guaranteed to return "present." A query for x not in S returns "absent" with probability at least 1-epsilon, where epsilon is a tunable false positive probability. If a query returns "present," but x is not in S, then x is a false positive of the AMQ. Because AMQs have a nonzero probability of false-positives, they require far less space than explicit set representations. AMQs are widely used to speed up dictionaries that are stored remotely (e.g., on disk/across a network). Most AMQs offer weak guarantees on the number of false positives they will return on a sequence of queries. The false-positive probability of epsilon holds only for a single query. It is easy for an adversary to drive an AMQ's false-positive rate towards 1 by simply repeating false positives. This paper shows what it takes to get strong guarantees on the number of false positives. We say that an AMQs is adaptive if it guarantees a false-positive probability of epsilon for every query, regardless of answers to previous queries. First, we prove that it is impossible to build a small adaptive AMQ, even when the AMQ is immediately told whenever it returns a false positive. We then show how to build an adaptive AMQ that partitions its state into a small local component and a larger remote component. In addition to being adaptive, the local component of our AMQ dominates existing AMQs in all regards. It uses optimal space up to lower-order terms and supports queries and updates in worst-case constant time, with high probability. Thus, we show that adaptivity has no cost
    corecore