145 research outputs found
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
I-GWAS: Privacy-Preserving Interdependent Genome-Wide Association Studies
Genome-wide Association Studies (GWASes) identify genomic variations that are
statistically associated with a trait, such as a disease, in a group of
individuals. Unfortunately, careless sharing of GWAS statistics might give rise
to privacy attacks. Several works attempted to reconcile secure processing with
privacy-preserving releases of GWASes. However, we highlight that these
approaches remain vulnerable if GWASes utilize overlapping sets of individuals
and genomic variations. In such conditions, we show that even when relying on
state-of-the-art techniques for protecting releases, an adversary could
reconstruct the genomic variations of up to 28.6% of participants, and that the
released statistics of up to 92.3% of the genomic variations would enable
membership inference attacks. We introduce I-GWAS, a novel framework that
securely computes and releases the results of multiple possibly interdependent
GWASes. I-GWAS continuously releases privacy-preserving and noise-free GWAS
results as new genomes become available
Evaluating Methods for Privacy-Preserving Data Sharing in Genomics
The availability of genomic data is often essential to progress in biomedical re- search, personalized medicine, drug development, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. In this dissertation, we study and build systems that are geared towards privacy preserving genomic data sharing. We first look at the Matchmaker Exchange, a platform that connects multiple distributed databases through an API and allows researchers to query for genetic variants in other databases through the network. However, queries are broadcast to all researchers that made a similar query in any of the connected databases, which can lead to a reluctance to use the platform, due to loss of privacy or competitive advantage. In order to overcome this reluctance, we propose a framework to support anonymous querying on the platform. Since genomic dataâs sensitivity does not degrade over time, we analyze the real-world guarantees provided by the only tool available for long term genomic data storage. We find that the system offers low security when the adversary has access to side information, and we support our claims by empirical evidence. We also study the viability of synthetic data for privacy preserving data sharing. Since for genomic data research, the utility of the data provided is of the utmost importance, we first perform a utility evaluation on generative models for different types of datasets (i.e., financial data, images, and locations). Then, we propose a privacy evaluation framework for synthetic data. We then perform a measurement study assessing state-of-the-art generative models specifically geared for human genomic data, looking at both utility and privacy perspectives. Overall, we find that there is no single approach for generating synthetic data that performs well across the board from both utility and privacy perspectives
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective
Rapid advances in human genomics are enabling researchers to gain a better
understanding of the role of the genome in our health and well-being,
stimulating hope for more effective and cost efficient healthcare. However,
this also prompts a number of security and privacy concerns stemming from the
distinctive characteristics of genomic data. To address them, a new research
community has emerged and produced a large number of publications and
initiatives.
In this paper, we rely on a structured methodology to contextualize and
provide a critical analysis of the current knowledge on privacy-enhancing
technologies used for testing, storing, and sharing genomic data, using a
representative sample of the work published in the past decade. We identify and
discuss limitations, technical challenges, and issues faced by the community,
focusing in particular on those that are inherently tied to the nature of the
problem and are harder for the community alone to address. Finally, we report
on the importance and difficulty of the identified challenges based on an
online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies
(PoPETs), Vol. 2019, Issue
Secure and Distributed Assessment of Privacy-Preserving Releases of GWAS
Genome-wide association studies (GWAS) identify correlations between the
genetic variants and an observable characteristic such as a disease. Previous
works presented privacy-preserving distributed algorithms for a federation of
genome data holders that spans multiple institutional and legislative domains
to securely compute GWAS results. However, these algorithms have limited
applicability, since they still require a centralized instance to decide
whether GWAS results can be safely disclosed, which is in violation to privacy
regulations, such as GDPR. In this work, we introduce GenDPR, a distributed
middleware that leverages Trusted Execution Environments (TEEs) to securely
determine a subset of the potential GWAS statistics that can be safely
released. GenDPR achieves the same accuracy as centralized solutions, but
requires transferring significantly less data because TEEs only exchange
intermediary results but no genomes. Additionally, GenDPR can be configured to
tolerate all-but-one honest-but-curious federation members colluding with the
aim to expose genomes of correct members
Cryptographic solutions for genomic privacy
With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., OpenSNP or 23andMe). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimerâs or Parkinsonâs), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a personâs genome is correlated to his family membersâ genomes, thus leading to interdependent privacy risks. In this work, focusing on our existing and ongoing work on genomic privacy, we will first highlight one serious threat for genomic privacy. Then, we will present the high level descriptions of our cryptographic solutions to protect the privacy of genomic data. © International Financial Cryptography Association 2016
- âŠ