Search CORE

31 research outputs found

SicHash -- Small Irregular Cuckoo Tables for Perfect Hashing

Author: Lehmann Hans-Peter
Sanders Peter
Walzer Stefan
Publication venue
Publication date: 08/11/2022
Field of study

A Perfect Hash Function (PHF) is a hash function that has no collisions on a given input set. PHFs can be used for space efficient storage of data in an array, or for determining a compact representative of each object in the set. In this paper, we present the PHF construction algorithm SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. At its core, SicHash uses a known technique: It places objects in a cuckoo hash table and then stores the final hash function choice of each object in a retrieval data structure. We combine the idea with irregular cuckoo hashing, where each object has a different number of hash functions. Additionally, we use many small tables that we overload beyond their asymptotic maximum load factor. The most space efficient competitors often use brute force methods to determine the PHFs. SicHash provides a more direct construction algorithm that only rarely needs to recompute parts. Our implementation improves the state of the art in terms of space usage versus construction time for a wide range of configurations. At the same time, it provides very fast queries

arXiv.org e-Print Archive

KITopen

Adaptive one memory access bloom filters

Author: Larrabeiti López David
Reviriego Vasallo Pedro
Rottenstreich Ori
Sánchez-Macián Alfonso
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2022
Field of study

Bloom filters are widely used to perform fast approximate membership checking in networking applications. The main limitation of Bloom filters is that they suffer from false positives that can only be reduced by using more memory. We suggest to take advantage of a common repetition in the identity of queried elements to adapt Bloom filters for avoiding false positives for elements that repeat upon queries. In this paper, one memory access Bloom filters are used to design an adaptation scheme that can effectively remove false positives while completing all queries in a single memory access. The proposed filters are well suited for scenarios on which the number of memory bits per element is low and thus complement existing adaptive cuckoo filters that are not efficient in that case. The evaluation results using packet traces show that the proposed adaptive Bloom filters can significantly reduce the false positive rate in networking applications with the single memory access. In particular, when using as few as four bits per element, false positive rates below 5% are achieved.This work was supported by the ACHILLES project PID2019-104207RB-I00 and the Go2Edge network RED2018-102585-T funded by the Spanish Agencia Estatal de Investigación (AEI) 10.13039/501100011033 and by the Madrid Community research project TAPIR-CM grant no. P2018/TCS-4496

Universidad Carlos III de Madrid e-Archivo

Telescoping Filter: A Practical Adaptive Filter

Author: Lee David J.
McCauley Samuel
Singh Shikha
Stein Max
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th Annual European Symposium on Algorithms (ESA 2021)
Publication date: 01/01/2021
Field of study

Filters are small, fast, and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, filtering out queries x ? S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positive probability of ?. Correctness is traded for space: filters only use log (1/?) + O(1) bits per element. The false-positive guarantees of most filters, however, hold only for a single query. In particular, if x is a false positive, a subsequent query to x is a false positive with probability 1, not ?. With this in mind, recent work has introduced the notion of an adaptive filter. A filter is adaptive if each query is a false positive with probability ?, regardless of answers to previous queries. This requires "fixing" false positives as they occur. Adaptive filters not only provide strong false positive guarantees in adversarial environments but also improve query performance on practical workloads by eliminating repeated false positives. Existing work on adaptive filters falls into two categories. On the one hand, there are practical filters, based on the cuckoo filter, that attempt to fix false positives heuristically without meeting the adaptivity guarantee. On the other hand, the broom filter is a very complex adaptive filter that meets the optimal theoretical bounds. In this paper, we bridge this gap by designing the telescoping adaptive filter (TAF), a practical, provably adaptive filter. We provide theoretical false-positive and space guarantees for our filter, along with empirical results where we compare its performance against state-of-the-art filters. We also implement the broom filter and compare it to the TAF. Our experiments show that theoretical adaptivity can lead to improved false-positive performance on practical inputs, and can be achieved while maintaining throughput that is similar to non-adaptive filters

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Adaptive cuckoo filters

Author: Mitzenmacher M
Pontarelli S
Reviriego P
Publication venue
Publication date: 01/01/2020
Field of study

We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false-positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false-positive rate by presenting both a theoretical model for the false-positive rate and simulations using both synthetic data sets and real packet trace

Archivio della ricerca- Università di Roma La Sapienza

Efficient Online Summarization of Large-Scale Dynamic Networks

Author: Jensen Christian Søndergaard
liu siyuan
Qu Qiang
zhu feida
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

Crossref

VBN

Efficient Implementation of Extended Learned Bloom Filter

Author: 양수현
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 김형주.기존의 자료구조는 데이터의 분포와 무관하게 일정한 성능을 가지는 것을 목표로 하고 있다. 하지만 데이터의 분포를 사용한다면 성능을 개선할 수 있다는 연구가 학습 인덱스라는 이름으로 진행되고 있다. 본 연구에서는 학습 인덱스의 종류 중 하나인 학습 블룸 필터를 확장하고 구현하는데 초점을 둔다. 이를 확장 학습 블룸 필터라고 부르며, 이는 기존의 학습 블룸 필터의 구조에 학습 해시 함수를 추가한 자료구조이다. 해당 자료구조는 하이퍼파라미터 ɑ를 통해서 학습 해시 함수와 보조 필터의 비율을 조정할 수 있으며, 기존의 학습 블룸 필터에 비해서 거짓 양성 비율을 개선시킬 수 있음을 실험을 통해 보인다. 추가적으로 확장 학습 블룸 필터를 구현하는 도중에 발생했던 모델의 정밀도 문제를 소개하고, 이를 64비트 부동소수점으로 해결할 수 있음을 보인다. 그 외에도 모델 조정을 통해서 확장 학습 블룸 필터의 성능이 개선될 수 있음을 보이고, 학습 해시 함수가 성능 개선에 기여하는 방법을 이해하고자 한다.Existing data structures aim to have constant performance regardless of data distribution. However, a study is being conducted under the name of learned index, that the performance can be improved by the use of data distribution. In this study, we focus on extending and implementing learned bloom filter, which is one of the types of learned index. This is called as an extended learned bloom filter, and it is a data structure which adds a learned hash function into the structure of learned bloom filter. Experiments show that false positive rate can be improved through changing the ratio of learned hash function and the auxiliary filter using the hyperparameter ɑ. Additionally, we introduce the model precision problem that occurred during the implementation of the extended learned bloom filter, which can be solved by using 64-bit floating point. In addition, we show that the performance of the extended learned bloom filter can be improved by tuning the model, and we try to understand how the learned hash function contributes to performance improvement.제 1 장 서 론 1 제 2 장 배경 지식 3 2.1 블룸 필터의 개념 3 2.2 블룸 필터의 어플리케이션 7 2.3 학습 블룸 필터 12 2.4 학습 블룸 필터의 어플리케이션 19 제 3 장 제안 모델 23 3.1 학습 블룸 필터 관련 연구 23 3.2 확장 학습 블룸 필터 25 제 4 장 구 현 30 4.1 하이퍼파라미터 탐색 30 4.2 모델 정밀도 31 4.3 모델 조정 32 4.4 학습 해시 함수의 이해 33 제 5 장 실 험 36 5.1 실험 환경 36 5.2 하이퍼파라미터 탐색 실험 37 5.3 모델 정밀도 실험 39 5.4 모델 조정 실험 41 5.5 학습 해시 함수의 이해 실험 44 제 6 장 결론 및 향후 연구 46 참고 문헌 47 Appendix 50 Abstract 55석

SNU Open Repository and Archive