1,069 research outputs found
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
Routes for breaching and protecting genetic privacy
We are entering the era of ubiquitous genetic information for research,
clinical care, and personal curiosity. Sharing these datasets is vital for
rapid progress in understanding the genetic basis of human diseases. However,
one growing concern is the ability to protect the genetic privacy of the data
originators. Here, we technically map threats to genetic privacy and discuss
potential mitigation strategies for privacy-preserving dissemination of genetic
data.Comment: Draft for comment
SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search
The -Nearest Neighbor Search (-NNS) is the backbone of several
cloud-based services such as recommender systems, face recognition, and
database search on text and images. In these services, the client sends the
query to the cloud server and receives the response in which case the query and
response are revealed to the service provider. Such data disclosures are
unacceptable in several scenarios due to the sensitivity of data and/or privacy
laws.
In this paper, we introduce SANNS, a system for secure -NNS that keeps
client's query and the search result confidential. SANNS comprises two
protocols: an optimized linear scan and a protocol based on a novel sublinear
time clustering-based algorithm. We prove the security of both protocols in the
standard semi-honest model. The protocols are built upon several
state-of-the-art cryptographic primitives such as lattice-based additively
homomorphic encryption, distributed oblivious RAM, and garbled circuits. We
provide several contributions to each of these primitives which are applicable
to other secure computation tasks. Both of our protocols rely on a new circuit
for the approximate top- selection from numbers that is built from comparators.
We have implemented our proposed system and performed extensive experimental
results on four datasets in two different computation environments,
demonstrating more than faster response time compared to
optimally implemented protocols from the prior work. Moreover, SANNS is the
first work that scales to the database of 10 million entries, pushing the limit
by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202
PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database
Searchable symmetric encryption (SSE) has been used to protect the
confidentiality of genomic data while providing substring search and range
queries on a sequence of genomic data, but it has not been studied for
protecting single nucleotide polymorphism (SNP)-phenotype data. In this
article, we propose a novel model, PrivGenDB, for securely storing and
efficiently conducting different queries on genomic data outsourced to an
honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure
confidentiality while conducting different types of queries on encrypted
genomic data, phenotype and other information of individuals to help
analysts/clinicians in their analysis/care. To the best of our knowledge,
PrivGenDB construction is the first SSE-based approach ensuring the
confidentiality of shared SNP-phenotype data through encryption while making
the computation/query process efficient and scalable for biomedical research
and care. Furthermore, it supports a variety of query types on genomic data,
including count queries, Boolean queries, and k'-out-of-k match queries.
Finally, the PrivGenDB model handles the dataset containing both genotype and
phenotype, and it also supports storing and managing other metadata like gender
and ethnicity privately. Computer evaluations on a dataset with 5,000 records
and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match
query over 40 SNPs take approximately 4.3s and 86.4{\mu}s, respectively, that
outperforms the existing schemes
Algorithms that Remember: Model Inversion Attacks and Data Protection Law
Many individuals are concerned about the governance of machine learning
systems and the prevention of algorithmic harms. The EU's recent General Data
Protection Regulation (GDPR) has been seen as a core tool for achieving better
governance of this area. While the GDPR does apply to the use of models in some
limited situations, most of its provisions relate to the governance of personal
data, while models have traditionally been seen as intellectual property. We
present recent work from the information security literature around `model
inversion' and `membership inference' attacks, which indicate that the process
of turning training data into machine learned systems is not one-way, and
demonstrate how this could lead some models to be legally classified as
personal data. Taking this as a probing experiment, we explore the different
rights and obligations this would trigger and their utility, and posit future
directions for algorithmic governance and regulation.Comment: 15 pages, 1 figur
Anonymization and Risk
Perfect anonymization of data sets that contain personal information has failed. But the process of protecting data subjects in shared information remains integral to privacy practice and policy. While the deidentification debate has been vigorous and productive, there is no clear direction for policy. As a result, the law has been slow to adapt a holistic approach to protecting data subjects when data sets are released to others. Currently, the law is focused on whether an individual can be identified within a given set. We argue that the best way to move data release policy past the alleged failures of anonymization is to focus on the process of minimizing risk of reidentification and sensitive attribute disclosure, not preventing harm. Process-based data release policy, which resembles the law of data security, will help us move past the limitations of focusing on whether data sets have been “anonymized.” It draws upon different tactics to protect the privacy of data subjects, including accurate deidentification rhetoric, contracts prohibiting reidentification and sensitive attribute disclosure, data enclaves, and query-based strategies to match required protections with the level of risk. By focusing on process, data release policy can better balance privacy and utility where nearly all data exchanges carry some risk
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
- …