547 research outputs found

    PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

    Full text link
    Searchable symmetric encryption (SSE) has been used to protect the confidentiality of genomic data while providing substring search and range queries on a sequence of genomic data, but it has not been studied for protecting single nucleotide polymorphism (SNP)-phenotype data. In this article, we propose a novel model, PrivGenDB, for securely storing and efficiently conducting different queries on genomic data outsourced to an honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure confidentiality while conducting different types of queries on encrypted genomic data, phenotype and other information of individuals to help analysts/clinicians in their analysis/care. To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of shared SNP-phenotype data through encryption while making the computation/query process efficient and scalable for biomedical research and care. Furthermore, it supports a variety of query types on genomic data, including count queries, Boolean queries, and k'-out-of-k match queries. Finally, the PrivGenDB model handles the dataset containing both genotype and phenotype, and it also supports storing and managing other metadata like gender and ethnicity privately. Computer evaluations on a dataset with 5,000 records and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4{\mu}s, respectively, that outperforms the existing schemes

    PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

    Full text link
    Privacy and security issues limit the query executions over genomics datasets, notably single nucleotide polymorphisms (SNPs), raised by the sensitivity of this type of data. Therefore, it is important to ensure that executing queries on these datasets do not reveal sensitive information, such as the identity of the individuals and their genetic traits, to a data server. In this paper, we propose and present a novel model, we call PrivGenDB, to ensure the confidentiality of SNP-phenotype data while executing queries. The confidentiality in PrivGenDB is enabled by its system architecture and the search functionality provided by searchable symmetric encryption (SSE). To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of SNP-phenotype data as the current SSE-based approaches for genomic data are limited only to substring search and range queries on a sequence of genomic data. Besides, a new data encoding mechanism is proposed and incorporated in the PrivGenDB model. This enables PrivGenDB to handle the dataset containing both genotype and phenotype and also support storing and managing other metadata, like gender and ethnicity, privately. Furthermore, different queries, namely Count, Boolean, Negation and k′-out-of-k match queries used for genomic data analysis, are supported and executed by PrivGenDB. The execution of these queries on genomic data in PrivGenDB is efficient and scalable for biomedical research and services. These are demonstrated by our analytical and empirical analysis presented in this paper. Specifically, our empirical studies on a dataset with 5000 entries (records) containing 1000 SNPs demonstrate that a count/Boolean query and a k′-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, outperforming the existing schemes

    Arithmetics of Ciphertexts under Homomorphic Encryption

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 수리과학부, 2017. 2. 천정희.Privacy homomorphism is an important concept for encrypting clear data while allowing one to perform operations on encrypted data without decryption. Although the use of fully homomorphic encryption schemes theoretically allows for the secure evaluation of any function, the evaluation cost is still far from being practical for many functions and no secure solutions have been developed to satisfy the efficiency requirements. In this thesis, the foundation of our simple framework is a set of optimized circuits for the following operations: equality, greater-than comparison and integer addition. We first focus on the applications of homomorphic encryption for private query processing on encrypted databases. In particular, we construct a unied framework to eciently and privately process queries with search and compute operations by applying the underlying circuit primitives. Since genomic data contains numerous distinguishing features and sensitive personal information, a privacy-preserving genome analysis in a cloud computing environment becomes the major issue in bioinformatics. We present a method to perform the exact edit distance algorithm on encrypted data to obtain an encrypted result. We also describe how to privately compute the approximate edit distance between encrypted DNA sequences. Finally we create a homomorphic security system for searching a set of biomarkers to encrypted genomes. We propose an efficient method to securely search a matching position with biomarker and extract the information of DNA sequences at the position without complicated computation such as comparison.1 Introduction 1 1.1 Contributions 2 1.1.1 Private Databse Query Processing 2 1.1.2 Secure Genome Analysis 3 2 Preliminaries 6 2.1 Practical Homomorphic Encryption 6 2.1.1 The BGV-Type Scheme 7 2.1.2 The YASHE Scheme 8 2.1.3 The Ring-GSW Scheme 10 2.2 Human Genome Comparison 11 3 Primitive Arithmetic Circuits under Homomorphic Encryption 13 3.1 Binary Arithmetic Circuits 14 3.1.1 Equality Circuit 14 3.1.2 Greater-Than Comparison Circuit 15 3.1.3 Integer Addition Circuit 16 3.1.4 Maximum & Minimum Circuits 17 3.2 Arithmetic Circuits over the Integers 19 3.2.1 Calibrating Binary Circuit Primitives 20 3.2.2 Arithmetic Circuits over the Integers Based on Fermat's Little Theorem 20 3.2.3 Arithmetic Circuits over the Integers Based on Lagrange Interpolation Formula 21 4 Private Database Query Processing 23 4.1 General-Purpose Search-and-Compute 23 4.1.1 A High-level Overview of Our Approach 25 4.1.2 Security Evaluation 25 4.2 Applications to Encrypted Databases 26 4.2.1 Search Queries 26 4.2.2 Search-and-Compute Queries 28 4.2.3 Join Queries 29 4.3 Performance Improvement 30 4.3.1 Larger Message Spaces with Lazy Carry Processing 30 4.3.2 Calibrating Circuit Primitives 31 4.4 Implementation and Discussion 33 4.4.1 Adjusting the Parameters 34 4.4.2 Experiments for Search Queries 35 4.4.3 Experiments for Search-and-Sum 35 4.4.4 Experiments for Search-and-Count 38 4.5 Handling Join Query 40 5 Secure Genome Analysis Based on Homomorphic Encryption 43 5.1 Exact Edit Distance Algorithm 43 5.1.1 Encrypted Edit Distance Algorithm 43 5.1.2 Optimizations Based on Block Computation 45 5.1.3 Optimization of Encrypted Edit Distance Algorithm Based on Pathnding Method 47 5.1.4 Implementation 50 5.2 Approximate Edit Distance Algorithm 51 5.2.1 Encoding Genomic Data 52 5.2.2 Secure DNA Sequence Comparison with Bit-sliced Implementation 53 5.2.3 Secure DNA Sequence Comparison with Integer-based Implementation 55 5.2.4 Implementation 56 5.3 Secure Searching of Biomarkers 58 5.3.1 Privacy-Preserving Database Searching and Extraction 59 5.3.2 Secure Searching of Biomarkers 63 5.3.3 Optimization Techniques 64 5.3.4 Implementation 66 6 Conclusions 69 Bibliography 71 Abstract (in Korean) 77Docto

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

    Secure Similar Sequence Query on Outsourced Genomic Data

    Get PDF
    The growing availability of genomic data is unlocking research potentials on genomic-data analysis. It is of great importance to outsource the genomic-analysis tasks onto clouds to leverage their powerful computational resources over the large-scale genomic sequences. However, the remote placement of the data raises personal-privacy concerns, and it is challenging to evaluate data-analysis functions on outsourced genomic data securely and efficiently. In this work, we study the secure similar-sequence-query (SSQ) problem over outsourced genomic data, which has not been fully investigated. To address the challenges of security and efficiency, we propose two protocols in the mixed form, which combine two-party secure secret sharing, garbled circuit, and partial homomorphic encryptions together and use them to jointly fulfill the secure SSQ function. In addition, our protocols support multi-user queries over a joint genomic data set collected from multiple data owners, making our solution scalable. We formally prove the security of protocols under the semi-honest adversary model, and theoretically analyze the performance. We use extensive experiments over real-world dataset on a commercial cloud platform to validate the efficacy of our proposed solution, and demonstrate the performance improvements compared with state-of-the-art works

    Efficient Privacy-preserving Whole-Genome Variant Queries

    Get PDF
    MOTIVATION: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data. RESULTS: We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data. AVAILABILITY AND IMPLEMENTATION: https://gitlab.com/DIFUTURE/privacy-preserving-variant-queries. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
    corecore