178 research outputs found
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database
Searchable symmetric encryption (SSE) has been used to protect the
confidentiality of genomic data while providing substring search and range
queries on a sequence of genomic data, but it has not been studied for
protecting single nucleotide polymorphism (SNP)-phenotype data. In this
article, we propose a novel model, PrivGenDB, for securely storing and
efficiently conducting different queries on genomic data outsourced to an
honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure
confidentiality while conducting different types of queries on encrypted
genomic data, phenotype and other information of individuals to help
analysts/clinicians in their analysis/care. To the best of our knowledge,
PrivGenDB construction is the first SSE-based approach ensuring the
confidentiality of shared SNP-phenotype data through encryption while making
the computation/query process efficient and scalable for biomedical research
and care. Furthermore, it supports a variety of query types on genomic data,
including count queries, Boolean queries, and k'-out-of-k match queries.
Finally, the PrivGenDB model handles the dataset containing both genotype and
phenotype, and it also supports storing and managing other metadata like gender
and ethnicity privately. Computer evaluations on a dataset with 5,000 records
and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match
query over 40 SNPs take approximately 4.3s and 86.4{\mu}s, respectively, that
outperforms the existing schemes
Constructive Privacy for Shared Genetic Data
International audienceThe need for the sharing of genetic data, for instance, in genome-wide association studies is incessantly growing. In parallel, serious privacy concerns rise from a multi-party access to genetic information. Several techniques , such as encryption, have been proposed as solutions for the privacy-preserving sharing of genomes. However, existing programming means do not support guarantees for privacy properties and the performance optimization of genetic applications involving shared data. We propose two contributions in this context. First, we present new cloud-based architectures for cloud-based genetic applications that are motivated by the needs of geneticians. Second, we propose a model and implementation for the composition of watermarking with encryption, fragmentation, and client-side computations for the secure and privacy-preserving sharing of genetic data in the cloud
- âŠ