5 research outputs found

    Data Oblivious Genome Variants Search on Intel SGX

    Get PDF
    We show how to build a practical, private data oblivious genome variants search using Intel SGX. More precisely, we consider the problem posed in Track 2 of the iDash Privacy and Security Workshop 2017 competition, which was to search for variants with high χ2\chi^{2} statistic among certain genetic data over two populations. The winning solution of this iDash competition (developed by Carpov and Tortech) is extremely efficient, but not memory oblivious, which potentially made it vulnerable to a whole host of memory- and cache-based side channel attacks on SGX. In this paper, we adapt a framework in which we can exactly quantify this leakage. We provide a memory oblivious implementation with reasonable information leakage at the cost of some efficiency. Our solution is roughly an order of magnitude slower than the non-memory oblivious implementation, but still practical and much more efficient than naive memory-oblivious solutions--it solves the iDash problem in approximately 5 minutes. In order to do this, we develop novel definitions and models for oblivious dictionary merging, which may be of independent theoretical interest

    Defending Our Public Biological Databases as a Global Critical Infrastructure

    Get PDF
    Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly “retrofitted” mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity

    Secure and Distributed Assessment of Privacy-Preserving Releases of GWAS

    Full text link
    Genome-wide association studies (GWAS) identify correlations between the genetic variants and an observable characteristic such as a disease. Previous works presented privacy-preserving distributed algorithms for a federation of genome data holders that spans multiple institutional and legislative domains to securely compute GWAS results. However, these algorithms have limited applicability, since they still require a centralized instance to decide whether GWAS results can be safely disclosed, which is in violation to privacy regulations, such as GDPR. In this work, we introduce GenDPR, a distributed middleware that leverages Trusted Execution Environments (TEEs) to securely determine a subset of the potential GWAS statistics that can be safely released. GenDPR achieves the same accuracy as centralized solutions, but requires transferring significantly less data because TEEs only exchange intermediary results but no genomes. Additionally, GenDPR can be configured to tolerate all-but-one honest-but-curious federation members colluding with the aim to expose genomes of correct members

    Secure, privacy-preserving and practical collaborative Genome-Wide Association Studies

    Get PDF
    Understanding the interplay between genomics and human health is a crucial step for the advancement and development of our society. Genome-Wide Association Study (GWAS) is one of the most popular methods for discovering correlations between genomic variations associated with a particular phenotype (i.e., an observable trait such as a disease). Leveraging genome data from multiple institutions worldwide nowadays is essential to produce more powerful findings by operating GWAS at larger scale. However, this raises several security and privacy risks, not only in the computation of such statistics, but also in the public release of GWAS results. To that extent, several solutions in the literature have adopted cryptographic approaches to allow secure and privacy-preserving processing of genome data for federated analysis. However, conducting federated GWAS in a secure and privacy-preserving manner is not enough since the public releases of GWAS results might be vulnerable to known genomic privacy attacks, such as recovery and membership attacks. The present thesis explores possible solutions to enable end-to-end privacy-preserving federated GWAS in line with data privacy regulations such as GDPR to secure the public release of the results of Genome-Wide Association Studies (GWASes) that are dynamically updated as new genomes become available, that might overlap with their genomes and considered locations within the genome, that can support internal threats such as colluding members in the federation and that are computed in a distributed manner without shipping actual genome data. While achieving these goals, this work created several contributions described below. First, the thesis proposes DyPS, a Trusted Execution Environment (TEE)-based framework that reconciles efficient and secure genome data outsourcing with privacy-preserving data processing inside TEE enclaves to assess and create private releases of dynamic GWAS. In particular, DyPS presents the conditions for the creation of safe dynamic releases certifying that the theoretical complexity of the solution space an external probabilistic polynomial-time (p.p.t.) adversary or a group of colluders (up to all-but-one parties) would need to infer when launching recovery attacks on the observation of GWAS statistics is large enough. Besides that, DyPS executes an exhaustive verification algorithm along with a Likelihood-ratio test to measure the probability of identifying individuals in studies. Thus, also protecting individuals against membership inference attacks. Only safe genome data (i.e., genomes and SNPs) that DyPS selects are further used for the computation and release of GWAS results. At the same time, the remaining (unsafe) data is kept secluded and protected inside the enclave until it eventually can be used. Our results show that if dynamic releases are not improperly evaluated, up to 8% of genomes could be exposed to genomic privacy attacks. Moreover, the experiments show that DyPS’ TEE-based architecture can accommodate the computational resources demanded by our algorithms and present practical running times for larger-scale GWAS. Secondly, the thesis offers I-GWAS that identifies the new conditions for safe releases when considering the existence of overlapping data among multiple GWASes (e.g., same individuals participating in several studies). Indeed, it is shown that adversaries might leverage information of overlapping data to make both recovery and membership attacks feasible again (even if they are produced following the conditions for safe single-GWAS releases). Our experiments show that up to 28.6% of genetic variants of participants could be inferred during recovery attacks, and 92.3% of these variants would enable membership attacks from adversaries observing overlapping studies, which are withheld by I-GWAS. Lastly yet importantly, the thesis presents GenDPR, which encompasses extensions to our protocols so that the privacy-verification algorithms can be conducted distributively among the federation members without demanding the outsourcing of genome data across boundaries. Further, GenDPR can also cope with collusion among participants while selecting genome data that can be used to create safe releases. Additionally, GenDPRproduces the same privacy guarantees as centralized architectures, i.e., it correctly identifies and selects the same data in need of protection as with centralized approaches. In the end, the thesis presents a homogenized framework comprising DyPS, I-GWAS and GenDPR simultaneously. Thus, offering a usable approach for conducting practical GWAS. The method chosen for protection is of a statistical nature, ensuring that the theoretical complexity of attacks remains high and withholding releases of statistics that would impose membership inference risks to participants using Likelihood-ratio tests, despite adversaries gaining additional information over time, but the thesis also relates the findings to techniques that can be leveraged to protect releases (such as Differential Privacy). The proposed solutions leverage Intel SGX as Trusted Execution Environment to perform selected critical operations in a performant manner, however, the work translates equally well to other trusted execution environments and other schemes, such as Homomorphic Encryption
    corecore