31 research outputs found

    SicHash -- Small Irregular Cuckoo Tables for Perfect Hashing

    Get PDF
    A Perfect Hash Function (PHF) is a hash function that has no collisions on a given input set. PHFs can be used for space efficient storage of data in an array, or for determining a compact representative of each object in the set. In this paper, we present the PHF construction algorithm SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. At its core, SicHash uses a known technique: It places objects in a cuckoo hash table and then stores the final hash function choice of each object in a retrieval data structure. We combine the idea with irregular cuckoo hashing, where each object has a different number of hash functions. Additionally, we use many small tables that we overload beyond their asymptotic maximum load factor. The most space efficient competitors often use brute force methods to determine the PHFs. SicHash provides a more direct construction algorithm that only rarely needs to recompute parts. Our implementation improves the state of the art in terms of space usage versus construction time for a wide range of configurations. At the same time, it provides very fast queries

    Adaptive one memory access bloom filters

    Get PDF
    Bloom filters are widely used to perform fast approximate membership checking in networking applications. The main limitation of Bloom filters is that they suffer from false positives that can only be reduced by using more memory. We suggest to take advantage of a common repetition in the identity of queried elements to adapt Bloom filters for avoiding false positives for elements that repeat upon queries. In this paper, one memory access Bloom filters are used to design an adaptation scheme that can effectively remove false positives while completing all queries in a single memory access. The proposed filters are well suited for scenarios on which the number of memory bits per element is low and thus complement existing adaptive cuckoo filters that are not efficient in that case. The evaluation results using packet traces show that the proposed adaptive Bloom filters can significantly reduce the false positive rate in networking applications with the single memory access. In particular, when using as few as four bits per element, false positive rates below 5% are achieved.This work was supported by the ACHILLES project PID2019-104207RB-I00 and the Go2Edge network RED2018-102585-T funded by the Spanish Agencia Estatal de Investigaciรณn (AEI) 10.13039/501100011033 and by the Madrid Community research project TAPIR-CM grant no. P2018/TCS-4496

    Telescoping Filter: A Practical Adaptive Filter

    Get PDF
    Filters are small, fast, and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, filtering out queries x ? S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positive probability of ?. Correctness is traded for space: filters only use log (1/?) + O(1) bits per element. The false-positive guarantees of most filters, however, hold only for a single query. In particular, if x is a false positive, a subsequent query to x is a false positive with probability 1, not ?. With this in mind, recent work has introduced the notion of an adaptive filter. A filter is adaptive if each query is a false positive with probability ?, regardless of answers to previous queries. This requires "fixing" false positives as they occur. Adaptive filters not only provide strong false positive guarantees in adversarial environments but also improve query performance on practical workloads by eliminating repeated false positives. Existing work on adaptive filters falls into two categories. On the one hand, there are practical filters, based on the cuckoo filter, that attempt to fix false positives heuristically without meeting the adaptivity guarantee. On the other hand, the broom filter is a very complex adaptive filter that meets the optimal theoretical bounds. In this paper, we bridge this gap by designing the telescoping adaptive filter (TAF), a practical, provably adaptive filter. We provide theoretical false-positive and space guarantees for our filter, along with empirical results where we compare its performance against state-of-the-art filters. We also implement the broom filter and compare it to the TAF. Our experiments show that theoretical adaptivity can lead to improved false-positive performance on practical inputs, and can be achieved while maintaining throughput that is similar to non-adaptive filters

    Adaptive cuckoo filters

    Get PDF
    We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false-positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false-positive rate by presenting both a theoretical model for the false-positive rate and simulations using both synthetic data sets and real packet trace

    Efficient Online Summarization of Large-Scale Dynamic Networks

    Get PDF

    Efficient Implementation of Extended Learned Bloom Filter

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€ํ˜•์ฃผ.๊ธฐ์กด์˜ ์ž๋ฃŒ๊ตฌ์กฐ๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ ์ผ์ •ํ•œ ์„ฑ๋Šฅ์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์—ฐ๊ตฌ๊ฐ€ ํ•™์Šต ์ธ๋ฑ์Šค๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ•™์Šต ์ธ๋ฑ์Šค์˜ ์ข…๋ฅ˜ ์ค‘ ํ•˜๋‚˜์ธ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ๋ฅผ ํ™•์žฅํ•˜๊ณ  ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ์ดˆ์ ์„ ๋‘”๋‹ค. ์ด๋ฅผ ํ™•์žฅ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ์ด๋Š” ๊ธฐ์กด์˜ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ์˜ ๊ตฌ์กฐ์— ํ•™์Šต ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•œ ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค. ํ•ด๋‹น ์ž๋ฃŒ๊ตฌ์กฐ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ษ‘๋ฅผ ํ†ตํ•ด์„œ ํ•™์Šต ํ•ด์‹œ ํ•จ์ˆ˜์™€ ๋ณด์กฐ ํ•„ํ„ฐ์˜ ๋น„์œจ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ธฐ์กด์˜ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ์— ๋น„ํ•ด์„œ ๊ฑฐ์ง“ ์–‘์„ฑ ๋น„์œจ์„ ๊ฐœ์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ์‹คํ—˜์„ ํ†ตํ•ด ๋ณด์ธ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ํ™•์žฅ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๋„์ค‘์— ๋ฐœ์ƒํ–ˆ๋˜ ๋ชจ๋ธ์˜ ์ •๋ฐ€๋„ ๋ฌธ์ œ๋ฅผ ์†Œ๊ฐœํ•˜๊ณ , ์ด๋ฅผ 64๋น„ํŠธ ๋ถ€๋™์†Œ์ˆ˜์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๊ทธ ์™ธ์—๋„ ๋ชจ๋ธ ์กฐ์ •์„ ํ†ตํ•ด์„œ ํ™•์žฅ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ์˜ ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ด๊ณ , ํ•™์Šต ํ•ด์‹œ ํ•จ์ˆ˜๊ฐ€ ์„ฑ๋Šฅ ๊ฐœ์„ ์— ๊ธฐ์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๊ณ ์ž ํ•œ๋‹ค.Existing data structures aim to have constant performance regardless of data distribution. However, a study is being conducted under the name of learned index, that the performance can be improved by the use of data distribution. In this study, we focus on extending and implementing learned bloom filter, which is one of the types of learned index. This is called as an extended learned bloom filter, and it is a data structure which adds a learned hash function into the structure of learned bloom filter. Experiments show that false positive rate can be improved through changing the ratio of learned hash function and the auxiliary filter using the hyperparameter ษ‘. Additionally, we introduce the model precision problem that occurred during the implementation of the extended learned bloom filter, which can be solved by using 64-bit floating point. In addition, we show that the performance of the extended learned bloom filter can be improved by tuning the model, and we try to understand how the learned hash function contributes to performance improvement.์ œ 1 ์žฅ ์„œ ๋ก  1 ์ œ 2 ์žฅ ๋ฐฐ๊ฒฝ ์ง€์‹ 3 2.1 ๋ธ”๋ฃธ ํ•„ํ„ฐ์˜ ๊ฐœ๋… 3 2.2 ๋ธ”๋ฃธ ํ•„ํ„ฐ์˜ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ 7 2.3 ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ 12 2.4 ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ์˜ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ 19 ์ œ 3 ์žฅ ์ œ์•ˆ ๋ชจ๋ธ 23 3.1 ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ ๊ด€๋ จ ์—ฐ๊ตฌ 23 3.2 ํ™•์žฅ ํ•™์Šต ๋ธ”๋ฃธ ํ•„ํ„ฐ 25 ์ œ 4 ์žฅ ๊ตฌ ํ˜„ 30 4.1 ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ 30 4.2 ๋ชจ๋ธ ์ •๋ฐ€๋„ 31 4.3 ๋ชจ๋ธ ์กฐ์ • 32 4.4 ํ•™์Šต ํ•ด์‹œ ํ•จ์ˆ˜์˜ ์ดํ•ด 33 ์ œ 5 ์žฅ ์‹ค ํ—˜ 36 5.1 ์‹คํ—˜ ํ™˜๊ฒฝ 36 5.2 ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ์‹คํ—˜ 37 5.3 ๋ชจ๋ธ ์ •๋ฐ€๋„ ์‹คํ—˜ 39 5.4 ๋ชจ๋ธ ์กฐ์ • ์‹คํ—˜ 41 5.5 ํ•™์Šต ํ•ด์‹œ ํ•จ์ˆ˜์˜ ์ดํ•ด ์‹คํ—˜ 44 ์ œ 6 ์žฅ ๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ 46 ์ฐธ๊ณ  ๋ฌธํ—Œ 47 Appendix 50 Abstract 55์„
    corecore