5 research outputs found

    How Much Does GenoGuard Really "Guard"? An Empirical Analysis of Long-Term Security for Genomic Data

    Get PDF
    Due to its hereditary nature, genomic data is not only linked to its owner but to that of close relatives as well. As a result, its sensitivity does not really degrade over time; in fact, the relevance of a genomic sequence is likely to be longer than the security provided by encryption. This prompts the need for specialized techniques providing long-term security for genomic data, yet the only available tool for this purpose is GenoGuard~\citehuang_genoguard:_2015. By relying on \em Honey Encryption, GenoGuard is secure against an adversary that can brute force all possible keys; i.e., whenever an attacker tries to decrypt using an incorrect password, she will obtain an incorrect but plausible looking decoy sequence. In this paper, we set to analyze the real-world security guarantees provided by GenoGuard; specifically, assess how much more information does access to a ciphertext encrypted using GenoGuard yield, compared to one that was not. Overall, we find that, if the adversary has access to side information in the form of partial information from the target sequence, the use of GenoGuard does appreciably increase her power in determining the rest of the sequence. We show that, in the case of a sequence encrypted using an easily guessable (low-entropy) password, the adversary is able to rule out most decoy sequences, and obtain the target sequence with just 2.5% of it available as side information. In the case of a harder-to-guess (high-entropy) password, we show that the adversary still obtains, on average, better accuracy in guessing the rest of the target sequences than using state-of-the-art genomic sequence inference methods, obtaining up to 15% improvement in accuracy

    Evaluating Methods for Privacy-Preserving Data Sharing in Genomics

    Get PDF
    The availability of genomic data is often essential to progress in biomedical re- search, personalized medicine, drug development, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. In this dissertation, we study and build systems that are geared towards privacy preserving genomic data sharing. We first look at the Matchmaker Exchange, a platform that connects multiple distributed databases through an API and allows researchers to query for genetic variants in other databases through the network. However, queries are broadcast to all researchers that made a similar query in any of the connected databases, which can lead to a reluctance to use the platform, due to loss of privacy or competitive advantage. In order to overcome this reluctance, we propose a framework to support anonymous querying on the platform. Since genomic data’s sensitivity does not degrade over time, we analyze the real-world guarantees provided by the only tool available for long term genomic data storage. We find that the system offers low security when the adversary has access to side information, and we support our claims by empirical evidence. We also study the viability of synthetic data for privacy preserving data sharing. Since for genomic data research, the utility of the data provided is of the utmost importance, we first perform a utility evaluation on generative models for different types of datasets (i.e., financial data, images, and locations). Then, we propose a privacy evaluation framework for synthetic data. We then perform a measurement study assessing state-of-the-art generative models specifically geared for human genomic data, looking at both utility and privacy perspectives. Overall, we find that there is no single approach for generating synthetic data that performs well across the board from both utility and privacy perspectives

    On Secure Cloud Computing for Genomic Data: From Storage to Analysis

    Get PDF
    Although privacy is generally considered to be the right of an individual or group to control information about themselves, such a right has become challenging to protect in the digital era, this is exemplified by the case of cloud-based genomic computing. Despite the rapid progress in understanding, producing, and using genomic information, the practice of genomic data protection remains a fairly underdeveloped area. One of the indisputable reasons is that most nonexpert individuals do not realize the sensitive nature of their genomic data, unless it has been used against them. Many commercial organizations take advantage of their customers by taking control of personal genomic information, if customers want to benefit from services such as genetic analysis; even worse, these organizations often do not enforce proper protection, which could result in embarrassing data breaches. In this thesis, we investigate the potential threats of cloud- based genomic computing systems and propose various countermeasures by taking into account the functionality requirement. We begin with the most basic system where only symmetric encryption is needed for the cloud storage of genomic data, and we propose a new solution that protects the data against brute-force attacks that threaten the security of password-based encryption in direct-to-consumer companies. The solution employs honey encryption, where plaintext messages need to be transformed to a different space with uniform distribution on elements. We present a novel distribution-transformation encoder. We provide formal security proof of our solution. We analyze the scenario where efficient searching on encrypted data is necessary. We propose a system that provides fast retrieval on encrypted compressed data and that enables individuals to authorize access to fine-grained regions during data retrieval. Our solution addresses three critical dimensions in platforms that use large genomic data: encryption, compression, and efficient data retrieval. Compared with a previous de facto standard solution for storing aligned genomic data, our solution uses 18% less storage. To enable complicated data analysis, we focus on a proposal for secure quality-control of genomic data by using secure multi-party computation based on garbled circuits. Our proposal is for aggregated genomic data sharing, where researchers want to collaborate to perform large-scale genome-wide association studies in order to identify significant genetic variants for certain diseases. Data quality control is the very first stage of such a collaboration and remains a driving factor for further steps. We investigate the feasibility of advanced cryptographic techniques in the data protection of this phase. We demonstrate that for certain protocols, our solution is efficient and scalable. With the advent of precision medicine based on genomic data, the future of big data has become clearly inseparable from cloud-based genomic computing. It is important to continuously re-evaluate the standards of cloud-based genomic computing as novel technologies are developed, security threats arise, and more complex genomic analyses become possible. This is not only a battle against cyber criminals, but also against rigid and ignorant practices. Integrative solutions that carefully consider the use and misuse of personal genomic data are essential for ensuring secure, effective storage and maximizing utility in treating and preventing disease

    Sociotechnical Safeguards for Genomic Data Privacy

    Get PDF
    Recent developments in a variety of sectors, including health care, research and the direct-to-consumer industry, have led to a dramatic increase in the amount of genomic data that are collected, used and shared. This state of affairs raises new and challenging concerns for personal privacy, both legally and technically. This Review appraises existing and emerging threats to genomic data privacy and discusses how well current legal frameworks and technical safeguards mitigate these concerns. It concludes with a discussion of remaining and emerging challenges and illustrates possible solutions that can balance protecting privacy and realizing the benefits that result from the sharing of genetic information

    How Much Does GenoGuard Really "Guard"? : An Empirical Analysis of Long-Term Security for Genomic Data

    No full text
    Due to its hereditary nature, genomic data is not only linked to its owner but to that of close relatives as well. As a result, its sensitivity does not really degrade over time; in fact, the relevance of a genomic sequence is likely to be longer than the security provided by encryption. This prompts the need for specialized techniques providing long-term security for genomic data, yet the only available tool for this purpose is GenoGuard (Huang et al., 2015). By relying on Honey Encryption, GenoGuard is secure against an adversary that can brute force all possible keys; i.e., whenever an attacker tries to decrypt using an incorrect password, she will obtain an incorrect but plausible looking decoy sequence. In this paper, we set to analyze the real-world security guarantees provided by GenoGuard; specifically, assess how much more information does access to a ciphertext encrypted using GenoGuard yield, compared to one that was not. Overall, we find that, if the adversary has access to side information in the form of partial information from the target sequence, the use of GenoGuard does appreciably increase her power in determining the rest of the sequence. We show that, in the case of a sequence encrypted using an easily guessable (low-entropy) password, the adversary is able to rule out most decoy sequences, and obtain the target sequence with just 2.5\% of it available as side information. In the case of a harder-to-guess (high-entropy) password, we show that the adversary still obtains, on average, better accuracy in guessing the rest of the target sequences than using state-of-the-art genomic sequence inference methods, obtaining up to 15% improvement in accuracy
    corecore