Search CORE

227 research outputs found

Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA

Author: Goodrich Michael T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/04/2010
Field of study

We study the degree to which a character string,

Q

, leaks details about itself any time it engages in comparison protocols with a strings provided by a querier, Bob, even if those protocols are cryptographically guaranteed to produce no additional information other than the scores that assess the degree to which

Q

matches strings offered by Bob. We show that such scenarios allow Bob to play variants of the game of Mastermind with

Q

so as to learn the complete identity of

Q

. We show that there are a number of efficient implementations for Bob to employ in these Mastermind attacks, depending on knowledge he has about the structure of

Q

, which show how quickly he can determine

Q

. Indeed, we show that Bob can discover

Q

using a number of rounds of test comparisons that is much smaller than the length of

Q

, under reasonable assumptions regarding the types of scores that are returned by the cryptographic protocols and whether he can use knowledge about the distribution that

Q

comes from. We also provide the results of a case study we performed on a database of mitochondrial DNA, showing the vulnerability of existing real-world DNA data to the Mastermind attack.Comment: Full version of related paper appearing in IEEE Symposium on Security and Privacy 2009, "The Mastermind Attack on Genomic Data." This version corrects the proofs of what are now Theorems 2 and 4

arXiv.org e-Print Archive

Crossref

Nonadaptive Mastermind Algorithms for String and Vector Databases, with Case Studies

Author: Asuncion Arthur U.
Goodrich Michael T.
Publication venue
Publication date: 01/01/2010
Field of study

In this paper, we study sparsity-exploiting Mastermind algorithms for attacking the privacy of an entire database of character strings or vectors, such as DNA strings, movie ratings, or social network friendship data. Based on reductions to nonadaptive group testing, our methods are able to take advantage of minimal amounts of privacy leakage, such as contained in a single bit that indicates if two people in a medical database have any common genetic mutations, or if two people have any common friends in an online social network. We analyze our Mastermind attack algorithms using theoretical characterizations that provide sublinear bounds on the number of queries needed to clone the database, as well as experimental tests on genomic information, collaborative filtering data, and online social networks. By taking advantage of the generally sparse nature of these real-world databases and modulating a parameter that controls query sparsity, we demonstrate that relatively few nonadaptive queries are needed to recover a large majority of each database

arXiv.org e-Print Archive

CiteSeerX

Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity

Author: Beck Martin
Kerschbaum Florian
Publication venue
Publication date: 12/02/2013
Field of study

Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure

arXiv.org e-Print Archive

Crossref

On the Query Complexity of Black-Peg AB-Mastermind

Author: El Ouali Mourad
Glazik Christian
Sauerland Volkmar
Srivastav Anand
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Mastermind is a two players zero sum game of imperfect information. Starting with Erd˝os and Rényi (1963), its combinatorics have been studied to date by several authors, e.g., Knuth (1977), Chvátal (1983), Goodrich (2009). The ﬁrst player, called “codemaker”, chooses a secret code and the second player, called “codebreaker”, tries to break the secret code by making as few guesses as possible, exploiting information that is given by the codemaker after each guess. For variants that allow color repetition, Doerr et al. (2016) showed optimal results. In this paper, we consider the so called Black-Peg variant of Mastermind, where the only information concerning a guess is the number of positions in which the guess coincides with the secret code. More precisely, we deal with a special version of the Black-Peg game with n holes and k ≥ n colors where no repetition of colors is allowed. We present upper and lower bounds on the number of guesses necessary to break the secret code. For the case k = n, the secret code can be algorithmically identiﬁed within less than (n − 3)dlog 2 ne + 5 2 n − 1 queries. This result improves the result of Ker-I Ko and Shia-Chung Teng (1985) by almost a factor of 2. For the case k > n, we prove an upper bound of (n − 2)dlog 2 ne + k + 1. Furthermore, we prove a new lower bound of n for the case k = n, which improves the recent n − log log(n) bound of Berger et al. (2016). We then generalize this lower bound to k queries for the case k ≥ n

Multidisciplinary Digital Publishing Institute

OceanRep

MACAU: Open Access Repository of Kiel University

Privacy in the Genomic Era

Author: Ayday Erman
Clayton Ellen W.
Fellay Jacques
Gunter Carl A.
Hubaux Jean-Pierre
Malin Bradley A.
Naveed Muhammad
Wang XiaoFeng
Publication venue
Publication date: 01/01/2015
Field of study

Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Bilkent University Institutional Repository

PubMed Central

Cryptology ePrint Archive

Secure and Efficient Comparisons between Untrusted Parties

Author: Beck Martin
Publication venue
Publication date: 11/09/2018
Field of study

A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems

Technische Universität Dresden: Qucosa

Critical analysis and comparison of data protection techniques for genomics data sets

Author: Naro Daniel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

This work reviews the current literature on protecting genomic information. The goal is to provide insight on how to define a secure file format for such data. We compare the published ideas to the requirements defined by MPEG. We also propose new ideas to secure such data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC