Search CORE

18 research outputs found

Privacy Aware Data Deduplication for Side Channel in Cloud Storage

Author: Conti Mauro
Gochhayat Sarada Prasad
Lu Chun-Shien
Yu CHIA-MU
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Archivio istituzionale della ricerca - Università di Padova

Securely measuring the overlap between private datasets with cryptosets

Author: Matlock Matthew
Rozenblit Leon
Swamidass S. Joshua
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure

Crossref

Directory of Open Access Journals

Digital Commons@Becker

PubMed Central

Quotient Hash Tables - Efficiently Detecting Duplicates in Streaming Data

Author: Géraud Rémi
Lombard-Platet Marius
Naccache David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/01/2019
Field of study

This article presents the Quotient Hash Table (QHT) a new data structure for duplicate detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient filters (SQFs), resulting in a 33\% reduction in memory usage for equal performance. We provide a new and thorough analysis of both algorithms, with results of interest to other existing constructions. We also introduce an optimised version of our new data structure dubbed Queued QHT with Duplicates (QQHTD). Finally we discuss the effect of adversarial inputs for hash-based duplicate filters similar to QHT.Comment: Shorter version was accepted at SIGAPP SAC '1

arXiv.org e-Print Archive

Crossref

Multiple Set Matching with Bloom Matrix and Bloom Vector

Author: Concas Francesco
Hoque Mohammad Ashraful
Lu Jiaheng
Tarkoma Sasu
Xu Pengfei
Publication venue
Publication date: 01/03/2020
Field of study

Bloom Filter is a space-efficient probabilistic data structure for checking the membership of elements in a set. Given multiple sets, a standard Bloom Filter is not sufficient when looking for the items to which an element or a set of input elements belong. An example case is searching for documents with keywords in a large text corpus, which is essentially a multiple set matching problem where the input is single or multiple keywords, and the result is a set of possible candidate documents. This article solves the multiple set matching problem by proposing two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector, which generalize the standard Bloom Filter. Both structures are space-efficient and answer queries with a set of identifiers for multiple set matching problems. The space efficiency can be optimized according to the distribution of labels among multiple sets: Uniform and Zipf. Bloom Vector efficiently exploits the Zipf distribution of data for further space reduction. Indeed, both structures are much more space-efficient compared with the state-of-the-art, Bloofi. The results also highlight that a Lookup operation on Bloom Matrix is significantly faster than on Bloom Vector and Bloofi.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A structural query system for Han characters

Author: Skala Matthew
Publication venue
Publication date: 01/01/2016
Field of study

The IT University of Copenhagen's Repository

A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters

Author: Chunyan Shuai
Hengcheng Yang
Siqi Li
Xin Ouyang
Zheng Chen
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values

Crossref

Directory of Open Access Journals

PubMed Central