3,434 research outputs found
Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do
not want to reveal them to each other. We present a system for
privacy-preserving matching of strings, which differs from existing systems by
providing a deterministic approximation instead of an exact distance. It is
efficient (linear complexity), non-interactive and does not involve a third
party which makes it particularly suitable for cloud computing. We extend our
protocol, such that it mitigates iterated differential attacks proposed by
Goodrich. Further an implementation of the system is evaluated and compared
against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure
EsPRESSo: Efficient Privacy-Preserving Evaluation of Sample Set Similarity
Electronic information is increasingly often shared among entities without
complete mutual trust. To address related security and privacy issues, a few
cryptographic techniques have emerged that support privacy-preserving
information sharing and retrieval. One interesting open problem in this context
involves two parties that need to assess the similarity of their datasets, but
are reluctant to disclose their actual content. This paper presents an
efficient and provably-secure construction supporting the privacy-preserving
evaluation of sample set similarity, where similarity is measured as the
Jaccard index. We present two protocols: the first securely computes the
(Jaccard) similarity of two sets, and the second approximates it, using MinHash
techniques, with lower complexities. We show that our novel protocols are
attractive in many compelling applications, including document/multimedia
similarity, biometric authentication, and genetic tests. In the process, we
demonstrate that our constructions are appreciably more efficient than prior
work.Comment: A preliminary version of this paper was published in the Proceedings
of the 7th ESORICS International Workshop on Digital Privacy Management (DPM
2012). This is the full version, appearing in the Journal of Computer
Securit
A Taxonomy of Privacy-Preserving Record Linkage Techniques
The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research
An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage
The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL).While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality. We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/ accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL
Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA
We study the degree to which a character string, , leaks details about
itself any time it engages in comparison protocols with a strings provided by a
querier, Bob, even if those protocols are cryptographically guaranteed to
produce no additional information other than the scores that assess the degree
to which matches strings offered by Bob. We show that such scenarios allow
Bob to play variants of the game of Mastermind with so as to learn the
complete identity of . We show that there are a number of efficient
implementations for Bob to employ in these Mastermind attacks, depending on
knowledge he has about the structure of , which show how quickly he can
determine . Indeed, we show that Bob can discover using a number of
rounds of test comparisons that is much smaller than the length of , under
reasonable assumptions regarding the types of scores that are returned by the
cryptographic protocols and whether he can use knowledge about the distribution
that comes from. We also provide the results of a case study we performed
on a database of mitochondrial DNA, showing the vulnerability of existing
real-world DNA data to the Mastermind attack.Comment: Full version of related paper appearing in IEEE Symposium on Security
and Privacy 2009, "The Mastermind Attack on Genomic Data." This version
corrects the proofs of what are now Theorems 2 and 4
- …