Search CORE

49 research outputs found

Privacy-Preserving Data Sharing for Genome-Wide Association Studies

Author: Fienberg Stephen E.
Slavkovic Aleksandra B.
Uhler Caroline
Publication venue
Publication date: 03/05/2012
Field of study

Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression

arXiv.org e-Print Archive

PubMed Central

IST Austria: PubRep (Institute of Science and Technology)

Enabling Privacy-Preserving GWAS in Heterogeneous Human Populations

Author: Berger Bonnie
Sahinalp Cenk
Simmons Sean
Publication venue
Publication date: 15/04/2016
Field of study

The projected increase of genotyping in the clinic and the rise of large genomic databases has led to the possibility of using patient medical data to perform genomewide association studies (GWAS) on a larger scale and at a lower cost than ever before. Due to privacy concerns, however, access to this data is limited to a few trusted individuals, greatly reducing its impact on biomedical research. Privacy preserving methods have been suggested as a way of allowing more people access to this precious data while protecting patients. In particular, there has been growing interest in applying the concept of differential privacy to GWAS results. Unfortunately, previous approaches for performing differentially private GWAS are based on rather simple statistics that have some major limitations. In particular, they do not correct for population stratification, a major issue when dealing with the genetically diverse populations present in modern GWAS. To address this concern we introduce a novel computational framework for performing GWAS that tailors ideas from differential privacy to protect private phenotype information, while at the same time correcting for population stratification. This framework allows us to produce privacy preserving GWAS results based on two of the most commonly used GWAS statistics: EIGENSTRAT and linear mixed model (LMM) based statistics. We test our differentially private statistics, PrivSTRAT and PrivLMM, on both simulated and real GWAS datasets and find that they are able to protect privacy while returning meaningful GWAS results.Comment: To be presented at RECOMB 201

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies

Author: Fienberg Stephen E.
Slavković Aleksandra
Uhler Caroline
Yu Fei
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008) Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private

\chi^2

-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013).Comment: 28 pages, 2 figures, source code available upon reques

arXiv.org e-Print Archive

Elsevier - Publisher Connector

PubMed Central

IST Austria: PubRep (Institute of Science and Technology)

Differentially Private Model Selection with Penalized and Constrained Likelihood

Author: Chaudhuri K.
Chaudhuri K.
Chaudhuri K.
Dalenius T.
Duchi J. C.
Fienberg S.
Gaboardi M.
Hardt M.
Lei J.
Rubin D. B.
Smith A.
Tibshirani R.
Uhler C.
Publication venue
Publication date: 14/07/2016
Field of study

In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge

Author: Fei Yu
Zhanglong Ji
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector