1,048 research outputs found

    Privacy preserving protocol for detecting genetic relatives using rare variants.

    Get PDF
    MotivationHigh-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals is compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test.ResultsIn this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provide the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals.AvailabilityThe software is freely available for download at http://genetics.cs.ucla.edu/crypto/

    Privacy-Preserving Genetic Relatedness Test

    Get PDF
    An increasing number of individuals are turning to Direct-To-Consumer (DTC) genetic testing to learn about their predisposition to diseases, traits, and/or ancestry. DTC companies like 23andme and Ancestry.com have started to offer popular and affordable ancestry and genealogy tests, with services allowing users to find unknown relatives and long-distant cousins. Naturally, access and possible dissemination of genetic data prompts serious privacy concerns, thus motivating the need to design efficient primitives supporting private genetic tests. In this paper, we present an effective protocol for privacy-preserving genetic relatedness test (PPGRT), enabling a cloud server to run relatedness tests on input an encrypted genetic database and a test facility's encrypted genetic sample. We reduce the test to a data matching problem and perform it, privately, using searchable encryption. Finally, a performance evaluation of hamming distance based PP-GRT attests to the practicality of our proposals.Comment: A preliminary version of this paper appears in the Proceedings of the 3rd International Workshop on Genome Privacy and Security (GenoPri'16

    Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective

    Full text link
    Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2019, Issue

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

    Metaseq: Privacy Preserving Meta-analysis of Sequencing-based Association Studies

    Get PDF
    Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for meta-analysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exomesequencing data from multiple studies, simulating phenotypic information for powerful metaanalysis. The MetaSeq software is publicly available as open source

    Secure and Distributed Assessment of Privacy-Preserving Releases of GWAS

    Full text link
    Genome-wide association studies (GWAS) identify correlations between the genetic variants and an observable characteristic such as a disease. Previous works presented privacy-preserving distributed algorithms for a federation of genome data holders that spans multiple institutional and legislative domains to securely compute GWAS results. However, these algorithms have limited applicability, since they still require a centralized instance to decide whether GWAS results can be safely disclosed, which is in violation to privacy regulations, such as GDPR. In this work, we introduce GenDPR, a distributed middleware that leverages Trusted Execution Environments (TEEs) to securely determine a subset of the potential GWAS statistics that can be safely released. GenDPR achieves the same accuracy as centralized solutions, but requires transferring significantly less data because TEEs only exchange intermediary results but no genomes. Additionally, GenDPR can be configured to tolerate all-but-one honest-but-curious federation members colluding with the aim to expose genomes of correct members

    I-GWAS: Privacy-Preserving Interdependent Genome-Wide Association Studies

    Full text link
    Genome-wide Association Studies (GWASes) identify genomic variations that are statistically associated with a trait, such as a disease, in a group of individuals. Unfortunately, careless sharing of GWAS statistics might give rise to privacy attacks. Several works attempted to reconcile secure processing with privacy-preserving releases of GWASes. However, we highlight that these approaches remain vulnerable if GWASes utilize overlapping sets of individuals and genomic variations. In such conditions, we show that even when relying on state-of-the-art techniques for protecting releases, an adversary could reconstruct the genomic variations of up to 28.6% of participants, and that the released statistics of up to 92.3% of the genomic variations would enable membership inference attacks. We introduce I-GWAS, a novel framework that securely computes and releases the results of multiple possibly interdependent GWASes. I-GWAS continuously releases privacy-preserving and noise-free GWAS results as new genomes become available

    Statistical Methods and Privacy Preserving Protocols for Combining Genetic Data with Electronic Health Records

    Full text link
    In recent years, electronic health records (EHR) have been combined with genetic data to uncover disease biology and accelerate generation of hypotheses for drug development and treatment strategies. The goal of this dissertation is to develop novel statistical models that can address the challenges of analyzing ‘imperfect’ EHR data and to propose privacy-preserving methods that enable sensitive individual-level data sharing across EHR studies and other large genetic studies. In Chapter II, we propose a statistical method to address misclassified clinical outcomes, a common challenge in EHR data. One essential step of EHR-based genome-wide association studies is constructing a cohort of cases and controls for a specific disease from billing codes and other clinical or administrative data. Nearly always, a perfect strategy for deriving disease phenotypes from billing codes is not available, resulting in some incorrect case/control labels. Here, we propose a method to estimate the misclassification of case/control status by examining genotype information of dozens of disease associated loci. Through simulation and application to the Michigan Genomics Initiative data, we demonstrate that the method enables the evaluation of new EHR-based phenotype definition schemes and provides accurate estimates of disease association measures when phenotypes are misclassified. In Chapter III and IV, we focus on identifying overlapping samples between studies, a common challenge when aggregating information across datasets. We particularly focus on identifying duplicate or related samples when sharing the underlying individual level genetic data is restricted. We propose methods that do not require disclosure of individual identities but that can still identify genetic relatives across datasets. In Chapter III, we show that by grouping genotypes into segments and calculating summary statistics within each segment, we are able to obscure and encode individual-level genetic information. Relatives can be inferred with the coded genotypes using a likelihood model. Simulation and application to the Trans-Omics for Precision Medicine (TOPMed) program data demonstrate the utility and security of the method. In Chapter IV, we extend the method further, with a strategy that guarantees stronger encryption and is expected to work across heterogeneous populations. This secure protocol can infer genetic relatives among people of diverse ethnic backgrounds. The method works by combining a cryptographic technique, homomorphic encryption, with the robust relationship inference method previously described by Manichaikul et al (2010). Through simulations, we show that our method's performance is identical to that of implementations that use the original unencrypted genotypes. Our protocol scales well in computing time and is protected from several possible attacks. The secure protocol was again applied to TOPMed dataset. Securely identifying related samples will facilitate combination of results across datasets when there are restrictions to sharing the underlying individual level data. In conclusion, the methods developed here well enhance use of EHR data and genome data to improve accuracy of case/control status as well as decrease inclusion of relatives across studies when desired.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153415/1/xtzhao_1.pd

    Analyzing the Privacy and Societal Challenges Stemming from the Rise of Personal Genomic Testing

    Get PDF
    Progress in genomics is enabling researchers to better understand the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. At the same time, the rapid cost drop of genome sequencing has enabled the emergence of a booming market for direct-to-consumer (DTC) genetic testing. Nowadays, companies like 23andMe and AncestryDNA provide affordable health, genealogy, and ancestry reports, and have already tested tens of millions of customers. How- ever, while this technology has the potential to transform society by improving people’s lives, it also harbors dangers as it prompts important privacy and societal concerns. In this thesis, we shed light on these issues using a mixed-methods approach. We start by conducting a technical investigation of the limitations on privacy-enhancing technologies used for testing, storing, and sharing genomic data. We rely on a structured methodology to contextualize and provide a critical analysis of the current state-of-the-art and we identify and discuss ten open problems faced by the community. We then focus on the societal aspects of DTC genetic testing by conducting two large-scale analyses of the genetic testing discourse focusing on both mainstream and fringe social networks, specifically, Twitter, Reddit, and 4chan. Our analyses show that DTC genetic testing is a popular topic of discussion on all platforms. However, these discussions often include highly toxic language expressed through hateful and racist comments and openly antisemitic rhetoric, often conveyed through memes. Overall, our findings highlight that the rise in popularity of this new technology is accompanied by several societal implications that are unlikely to be addressed by only one research field and rather require a multi-disciplinary approach

    Homomorphic Encryption for Machine Learning in Medicine and Bioinformatics

    Get PDF
    Machine learning techniques are an excellent tool for the medical community to analyzing large amounts of medical and genomic data. On the other hand, ethical concerns and privacy regulations prevent the free sharing of this data. Encryption methods such as fully homomorphic encryption (FHE) provide a method evaluate over encrypted data. Using FHE, machine learning models such as deep learning, decision trees, and naive Bayes have been implemented for private prediction using medical data. FHE has also been shown to enable secure genomic algorithms, such as paternity testing, and secure application of genome-wide association studies. This survey provides an overview of fully homomorphic encryption and its applications in medicine and bioinformatics. The high-level concepts behind FHE and its history are introduced. Details on current open-source implementations are provided, as is the state of FHE for privacy-preserving techniques in machine learning and bioinformatics and future growth opportunities for FHE
    • …
    corecore