Search CORE

1,291 research outputs found

Privacy-Preserving Data Sharing for Genome-Wide Association Studies

Author: Fienberg Stephen E.
Slavkovic Aleksandra B.
Uhler Caroline
Publication venue
Publication date: 03/05/2012
Field of study

Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression

arXiv.org e-Print Archive

PubMed Central

IST Austria: PubRep (Institute of Science and Technology)

Supporting Regularized Logistic Regression Privately and Efficiently

Author: Li Wenfa
Liu Hongzhe
Xie Wei
Yang Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/09/2015
Field of study

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Enabling Privacy-Preserving GWAS in Heterogeneous Human Populations

Author: Berger Bonnie
Sahinalp Cenk
Simmons Sean
Publication venue
Publication date: 15/04/2016
Field of study

The projected increase of genotyping in the clinic and the rise of large genomic databases has led to the possibility of using patient medical data to perform genomewide association studies (GWAS) on a larger scale and at a lower cost than ever before. Due to privacy concerns, however, access to this data is limited to a few trusted individuals, greatly reducing its impact on biomedical research. Privacy preserving methods have been suggested as a way of allowing more people access to this precious data while protecting patients. In particular, there has been growing interest in applying the concept of differential privacy to GWAS results. Unfortunately, previous approaches for performing differentially private GWAS are based on rather simple statistics that have some major limitations. In particular, they do not correct for population stratification, a major issue when dealing with the genetically diverse populations present in modern GWAS. To address this concern we introduce a novel computational framework for performing GWAS that tailors ideas from differential privacy to protect private phenotype information, while at the same time correcting for population stratification. This framework allows us to produce privacy preserving GWAS results based on two of the most commonly used GWAS statistics: EIGENSTRAT and linear mixed model (LMM) based statistics. We test our differentially private statistics, PrivSTRAT and PrivLMM, on both simulated and real GWAS datasets and find that they are able to protect privacy while returning meaningful GWAS results.Comment: To be presented at RECOMB 201

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DPWeka: Achieving Differential Privacy in WEKA

Author: Katla Srinidhi
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2017
Field of study

Organizations belonging to the government, commercial, and non-profit industries collect and store large amounts of sensitive data, which include medical, financial, and personal information. They use data mining methods to formulate business strategies that yield high long-term and short-term financial benefits. While analyzing such data, the private information of the individuals present in the data must be protected for moral and legal reasons. Current practices such as redacting sensitive attributes, releasing only the aggregate values, and query auditing do not provide sufficient protection against an adversary armed with auxiliary information. In the presence of additional background information, the privacy protection framework, differential privacy, provides mathematical guarantees against adversarial attacks. Existing platforms for differential privacy employ specific mechanisms for limited applications of data mining. Additionally, widely used data mining tools do not contain differentially private data mining algorithms. As a result, for analyzing sensitive data, the cognizance of differentially private methods is currently limited outside the research community. This thesis examines various mechanisms to realize differential privacy in practice and investigates methods to integrate them with a popular machine learning toolkit, WEKA. We present DPWeka, a package that provides differential privacy capabilities to WEKA, for practical data mining. DPWeka includes a suite of differential privacy preserving algorithms which support a variety of data mining tasks including attribute selection and regression analysis. It has provisions for users to control privacy and model parameters, such as privacy mechanism, privacy budget, and other algorithm specific variables. We evaluate private algorithms on real-world datasets, such as genetic data and census data, to demonstrate the practical applicability of DPWeka

ScholarWorks@UARK

UARK (University of Arkansas )

Homomorphic Encryption for Machine Learning in Medicine and Bioinformatics

Author: Kahrobaei Delaram
Najarian Kayvan
Wood Alexander
Publication venue
Publication date: 01/01/2020
Field of study

Machine learning techniques are an excellent tool for the medical community to analyzing large amounts of medical and genomic data. On the other hand, ethical concerns and privacy regulations prevent the free sharing of this data. Encryption methods such as fully homomorphic encryption (FHE) provide a method evaluate over encrypted data. Using FHE, machine learning models such as deep learning, decision trees, and naive Bayes have been implemented for private prediction using medical data. FHE has also been shown to enable secure genomic algorithms, such as paternity testing, and secure application of genome-wide association studies. This survey provides an overview of fully homomorphic encryption and its applications in medicine and bioinformatics. The high-level concepts behind FHE and its history are introduced. Details on current open-source implementations are provided, as is the state of FHE for privacy-preserving techniques in machine learning and bioinformatics and future growth opportunities for FHE

White Rose Research Online

sPLINK : a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies

Author: Baumbach Jan
Frisch Tobias
Heider Dominik
Kacprowski Tim
Kaissis Georgios
List Markus
Matschinske Julian
Nasirigerdeh Reza
Pitkänen Esa
Rueckert Daniel
Späth Julian
Torkzadehmahani Reihaneh
Völker Uwe
Weiss Stefan
Wenke Nina Kerstin
Publication venue
Publication date: 24/01/2022
Field of study

Meta-analysis has been established as an effective approach to combining summary statistics of several genome-wide association studies (GWAS). However, the accuracy of meta-analysis can be attenuated in the presence of cross-study heterogeneity. We present sPLINK, a hybrid federated and user-friendly tool, which performs privacy-aware GWAS on distributed datasets while preserving the accuracy of the results. sPLINK is robust against heterogeneous distributions of data across cohorts while meta-analysis considerably loses accuracy in such scenarios. sPLINK achieves practical runtime and acceptable network usage for chi-square and linear/logistic regression tests.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

Sharing Privacy-sensitive Access to Neuroimaging and Genetics Data: A Review and Preliminary Validation

Author: Arbabshirani Mohammad R.
Calhoun Vince D.
Plis Sergey M.
Sarwate Anand D.
Turner Jessica
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2014
Field of study

The growth of data sharing initiatives for neuroimaging and genomics represents an exciting opportunity to confront the “small N” problem that plagues contemporary neuroimaging studies while further understanding the role genetic markers play in the function of the brain. When it is possible, open data sharing provides the most benefits. However, some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs) which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. An alternative approach is to only share data derivatives such as statistical summaries—the challenges here are to reformulate computational methods to quantify the privacy risks associated with sharing the results of those computations. For example, a derived map of gray matter is often as identifiable as a fingerprint. Thus alternative approaches to accessing data are needed. This paper reviews the relevant literature on differential privacy, a framework for measuring and tracking privacy loss in these settings, and demonstrates the feasibility of using this framework to calculate statistics on data distributed at many sites while still providing privacy

CiteSeerX

ScholarWorks @ Georgia State University

Frontiers - Publisher Connector

PubMed Central

A Multi-site Resting State fMRI Study on the Amplitude of Low Frequency Fluctuations in Schizophrenia

Author: Belger Aysenil
Bustillo Juan
Calhoun Vince D
Damaraju Eswar
Fbirn
Ford Judith M
Mathalon Daniel H
McEwen Sarah
Mueller Bryon A
Potkin Steven G
Turner Jessica A
van Erp Theo GM
Voyvodic James
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2013
Field of study

Background: This multi-site study compares resting state fMRI amplitude of low frequency fluctuations (ALFF) and fractional ALFF (fALFF) between patients with schizophrenia (SZ) and healthy controls (HC). Methods: Eyes-closed resting fMRI scans (5:38 min; n = 306, 146 SZ) were collected from 6 Siemens 3T scanners and one GE 3T scanner. Imaging data were pre-processed using an SPM pipeline. Power in the low frequency band (0.01–0.08 Hz) was calculated both for the original pre-processed data as well as for the pre-processed data after regressing out the six rigid-body motion parameters, mean white matter (WM) and cerebral spinal fluid (CSF) signals. Both original and regressed ALFF and fALFF measures were modeled with site, diagnosis, age, and diagnosis × age interactions. Results: Regressing out motion and non-gray matter signals significantly decreased fALFF throughout the brain as well as ALFF in the cortical edge, but significantly increased ALFF in subcortical regions. Regression had little effect on site, age, and diagnosis effects on ALFF, other than to reduce diagnosis effects in subcortical regions. There were significant effects of site across the brain in all the analyses, largely due to vendor differences. HC showed greater ALFF in the occipital, posterior parietal, and superior temporal lobe, while SZ showed smaller clusters of greater ALFF in the frontal and temporal/insular regions as well as in the caudate, putamen, and hippocampus. HC showed greater fALFF compared with SZ in all regions, though subcortical differences were only significant for original fALFF. Conclusions: SZ show greater eyes-closed resting state low frequency power in frontal cortex, and less power in posterior lobes than do HC; fALFF, however, is lower in SZ than HC throughout the cortex. These effects are robust to multi-site variability. Regressing out physiological noise signals significantly affects both total and fALFF measures, but does not affect the pattern of case/control differences

ScholarWorks @ Georgia State University

PubMed Central

eScholarship - University of California