547 research outputs found
PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database
Searchable symmetric encryption (SSE) has been used to protect the
confidentiality of genomic data while providing substring search and range
queries on a sequence of genomic data, but it has not been studied for
protecting single nucleotide polymorphism (SNP)-phenotype data. In this
article, we propose a novel model, PrivGenDB, for securely storing and
efficiently conducting different queries on genomic data outsourced to an
honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure
confidentiality while conducting different types of queries on encrypted
genomic data, phenotype and other information of individuals to help
analysts/clinicians in their analysis/care. To the best of our knowledge,
PrivGenDB construction is the first SSE-based approach ensuring the
confidentiality of shared SNP-phenotype data through encryption while making
the computation/query process efficient and scalable for biomedical research
and care. Furthermore, it supports a variety of query types on genomic data,
including count queries, Boolean queries, and k'-out-of-k match queries.
Finally, the PrivGenDB model handles the dataset containing both genotype and
phenotype, and it also supports storing and managing other metadata like gender
and ethnicity privately. Computer evaluations on a dataset with 5,000 records
and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match
query over 40 SNPs take approximately 4.3s and 86.4{\mu}s, respectively, that
outperforms the existing schemes
PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database
Privacy and security issues limit the query executions over genomics datasets, notably single nucleotide polymorphisms (SNPs), raised by the sensitivity of this type of data. Therefore, it is important to ensure that executing queries on these datasets do not reveal sensitive information, such as the identity of the individuals and their genetic traits, to a data server. In this paper, we propose and present a novel model, we call PrivGenDB, to ensure the confidentiality of SNP-phenotype data while executing queries. The confidentiality in PrivGenDB is enabled by its system architecture and the search functionality provided by searchable symmetric encryption (SSE). To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of SNP-phenotype data as the current SSE-based approaches for genomic data are limited only to substring search and range queries on a sequence of genomic data. Besides, a new data encoding mechanism is proposed and incorporated in the PrivGenDB model. This enables PrivGenDB to handle the dataset containing both genotype and phenotype and also support storing and managing other metadata, like gender and ethnicity, privately. Furthermore, different queries, namely Count, Boolean, Negation and k′-out-of-k match queries used for genomic data analysis, are supported and executed by PrivGenDB. The execution of these queries on genomic data in PrivGenDB is efficient and scalable for biomedical research and services. These are demonstrated by our analytical and empirical analysis presented in this paper. Specifically, our empirical studies on a dataset with 5000 entries (records) containing 1000 SNPs demonstrate that a count/Boolean query and a k′-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, outperforming the existing schemes
Arithmetics of Ciphertexts under Homomorphic Encryption
학위논문 (박사)-- 서울대학교 대학원 : 수리과학부, 2017. 2. 천정희.Privacy homomorphism is an important concept for encrypting clear data while allowing one to perform operations on encrypted data without decryption. Although the use of fully homomorphic encryption schemes theoretically allows for the secure evaluation of any function, the evaluation cost is still far from being practical for many functions and no secure solutions have been developed to satisfy the efficiency requirements.
In this thesis, the foundation of our simple framework is a set of optimized circuits for the following operations: equality, greater-than comparison and integer addition. We first focus on the applications of homomorphic encryption for private query processing on encrypted databases. In particular, we construct a unied framework to eciently and privately process queries with search and compute operations by applying the underlying circuit primitives.
Since genomic data contains numerous distinguishing features and sensitive personal information, a privacy-preserving genome analysis in a cloud computing environment becomes the major issue in bioinformatics. We present a method to perform the exact edit distance algorithm on encrypted data to obtain an encrypted result. We also describe how to privately compute the approximate edit distance between encrypted DNA sequences. Finally we create a homomorphic security system for searching a set of biomarkers to encrypted genomes. We propose an efficient method to securely search a matching position with biomarker and extract the information of DNA sequences at the position without complicated computation such as comparison.1 Introduction 1
1.1 Contributions 2
1.1.1 Private Databse Query Processing 2
1.1.2 Secure Genome Analysis 3
2 Preliminaries 6
2.1 Practical Homomorphic Encryption 6
2.1.1 The BGV-Type Scheme 7
2.1.2 The YASHE Scheme 8
2.1.3 The Ring-GSW Scheme 10
2.2 Human Genome Comparison 11
3 Primitive Arithmetic Circuits under Homomorphic Encryption 13
3.1 Binary Arithmetic Circuits 14
3.1.1 Equality Circuit 14
3.1.2 Greater-Than Comparison Circuit 15
3.1.3 Integer Addition Circuit 16
3.1.4 Maximum & Minimum Circuits 17
3.2 Arithmetic Circuits over the Integers 19
3.2.1 Calibrating Binary Circuit Primitives 20
3.2.2 Arithmetic Circuits over the Integers Based on Fermat's Little Theorem 20
3.2.3 Arithmetic Circuits over the Integers Based on Lagrange Interpolation Formula 21
4 Private Database Query Processing 23
4.1 General-Purpose Search-and-Compute 23
4.1.1 A High-level Overview of Our Approach 25
4.1.2 Security Evaluation 25
4.2 Applications to Encrypted Databases 26
4.2.1 Search Queries 26
4.2.2 Search-and-Compute Queries 28
4.2.3 Join Queries 29
4.3 Performance Improvement 30
4.3.1 Larger Message Spaces with Lazy Carry Processing 30
4.3.2 Calibrating Circuit Primitives 31
4.4 Implementation and Discussion 33
4.4.1 Adjusting the Parameters 34
4.4.2 Experiments for Search Queries 35
4.4.3 Experiments for Search-and-Sum 35
4.4.4 Experiments for Search-and-Count 38
4.5 Handling Join Query 40
5 Secure Genome Analysis Based on Homomorphic Encryption 43
5.1 Exact Edit Distance Algorithm 43
5.1.1 Encrypted Edit Distance Algorithm 43
5.1.2 Optimizations Based on Block Computation 45
5.1.3 Optimization of Encrypted Edit Distance Algorithm Based on Pathnding Method 47
5.1.4 Implementation 50
5.2 Approximate Edit Distance Algorithm 51
5.2.1 Encoding Genomic Data 52
5.2.2 Secure DNA Sequence Comparison with Bit-sliced Implementation 53
5.2.3 Secure DNA Sequence Comparison with Integer-based Implementation 55
5.2.4 Implementation 56
5.3 Secure Searching of Biomarkers 58
5.3.1 Privacy-Preserving Database Searching and Extraction 59
5.3.2 Secure Searching of Biomarkers 63
5.3.3 Optimization Techniques 64
5.3.4 Implementation 66
6 Conclusions 69
Bibliography 71
Abstract (in Korean) 77Docto
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
Secure Similar Sequence Query on Outsourced Genomic Data
The growing availability of genomic data is unlocking research potentials on genomic-data analysis. It is of great importance to outsource the genomic-analysis tasks onto clouds to leverage their powerful computational resources over the large-scale genomic sequences. However, the remote placement of the data raises personal-privacy concerns, and it is challenging to evaluate data-analysis functions on outsourced genomic data securely and efficiently. In this work, we study the secure similar-sequence-query (SSQ) problem over outsourced genomic data, which has not been fully investigated. To address the challenges of security and efficiency, we propose two protocols in the mixed form, which combine two-party secure secret sharing, garbled circuit, and partial homomorphic encryptions together and use them to jointly fulfill the secure SSQ function. In addition, our protocols support multi-user queries over a joint genomic data set collected from multiple data owners, making our solution scalable. We formally prove the security of protocols under the semi-honest adversary model, and theoretically analyze the performance. We use extensive experiments over real-world dataset on a commercial cloud platform to validate the efficacy of our proposed solution, and demonstrate the performance improvements compared with state-of-the-art works
Efficient Privacy-preserving Whole-Genome Variant Queries
MOTIVATION: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data. RESULTS: We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data. AVAILABILITY AND IMPLEMENTATION: https://gitlab.com/DIFUTURE/privacy-preserving-variant-queries. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
- …