Search CORE

19,717 research outputs found

Distance Sensitive Bloom Filters Without False Negatives

Author: Goswami Mayank
Pagh Rasmus
Silvestri Francesco
Sivertsen Johan
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 02/11/2016
Field of study

A Bloom filter is a widely used data-structure for representing a set

S

and answering queries of the form "Is

x

S

?". By allowing some false positive answers (saying "yes" when the answer is in fact `no') Bloom filters use space significantly below what is required for storing

S

. In the distance sensitive setting we work with a set

S

of (Hamming) vectors and seek a data structure that offers a similar trade-off, but answers queries of the form "Is

x

close to an element of

S

?" (in Hamming distance). Previous work on distance sensitive Bloom filters have accepted false positive and false negative answers. Absence of false negatives is of critical importance in many applications of Bloom filters, so it is natural to ask if this can be also achieved in the distance sensitive setting. Our main contributions are upper and lower bounds (that are tight in several cases) for space usage in the distance sensitive setting where false negatives are not allowed.Comment: Published in SODA 201

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

MPG.PuRe

Archivio istituzionale della ricerca - Università di Padova

Scalable Techniques for Similarity Search

Author: Nagireddy Siddartha Reddy
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2015
Field of study

Document similarity is similar to the nearest neighbour problem and has applications in various domains. In order to determine the similarity / dissimilarity of the documents first they need to be converted into sets containing shingles. Each document is converted into k-shingles, k being the length of each shingle. The similarity is calculated using Jaccard distance between sets and output into a characteristic matrix, the complexity to parse this matrix is significantly high especially when the sets are large. In this project we explore various approaches such as Min hashing, LSH & Bloom Filter to decrease the matrix size and to improve the time complexity. Min hashing creates a signature matrix which significantly smaller compared to a characteristic matrix. In this project we will look into Min-Hashing implementation, pros and cons. Also we will explore Locality Sensitive Hashing, Bloom Filters and their advantages

SJSU ScholarWorks

Recommended from our members

Usable Secure Private Search

Author: Bellovin Steven Michael
Cui Ang
Liu Bin
Malkin Tal G.
Raykova Mariana Petrova
Stolfo Salvatore
Vo Binh D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Real-world applications commonly require untrusting parties to share sensitive information securely. This article describes a secure anonymous database search (SADS) system that provides exact keyword match capability. Using a new reroutable encryption and the ideas of Bloom filters and deterministic encryption, SADS lets multiple parties efficiently execute exact-match queries over distributed encrypted databases in a controlled manner. This article further considers a more general search setting allowing similarity searches, going beyond existing work that considers similarity in terms of error tolerance and Hamming distance. This article presents a general framework, built on the cryptographic and privacy-preserving guarantees of the SADS primitive, for engineering usable private secure search systems

Columbia University Academic Commons

A neural data structure for novelty detection

Author: Dasgupta S.
Navlakha S.
Sheehan T. C.
Stevens C. F.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 18/12/2018
Field of study

Novelty detection is a fundamental biological problem that organisms must solve to determine whether a given stimulus departs from those previously experienced. In computer science, this problem is solved efficiently using a data structure called a Bloom filter. We found that the fruit fly olfactory circuit evolved a variant of a Bloom filter to assess the novelty of odors. Compared with a traditional Bloom filter, the fly adjusts novelty responses based on two additional features: the similarity of an odor to previously experienced odors and the time elapsed since the odor was last experienced. We elaborate and validate a framework to predict novelty responses of fruit flies to given pairs of odors. We also translate insights from the fly circuit to develop a class of distance- and time-sensitive Bloom filters that outperform prior filters when evaluated on several biological and computational datasets. Overall, our work illuminates the algorithmic basis of an important neurobiological problem and offers strategies for novelty detection in computational systems

Cold Spring Harbor Laboratory Institutional Repository

Data Leak Detection As a Service: Challenges and Solutions

Author: Shu Xiaokui
Yao Danfeng (Daphne)
Publication venue
Publication date: 01/01/2012
Field of study

We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework

Computer Science Technical Reports @Virginia Tech