Search CORE

45 research outputs found

Supporting Regularized Logistic Regression Privately and Efficiently

Author: Li Wenfa
Liu Hongzhe
Xie Wei
Yang Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/09/2015
Field of study

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

False Discovery Rate Control for High Dimensional Dependent Data with an Application to Large-Scale Genetic Association Studies

Author: Cai Tony
Li Hongzhe
Maris John
Xie Jichun
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Large-scale genetic association studies are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. Motivated by an application for analysis of data from the genetic association studies, we consider the problem of false discovery rate (FDR) control under the high dimensional multivariate normal model. Using the compound decision rule framework, we develop an optimal joint oracle procedure and propose to use a marginal procedure to approximate the optimal joint optimal procedure. We show that the marginal plug-in procedure is asymptotically optimal under mild conditions. Our results indicate that the multiple testing procedure developed under the independent model is not only valid but also asymptotically optimal for the high dimensional multivariate normal data under some weak dependency. We evaluate various procedures using simulation studies and demonstrate its application to a genome-wide association study of neuroblastoma (NB). The proposed procedure identified a few more genetic variants that are potentially associated with NB than the standard p-value-based FDR controlling procedure

ScholarlyCommons@Penn

The impact of endometrioma and laparoscopic cystectomy on ovarian reserve and the exploration of related factors assessed by serum anti-Mullerian hormone: a prospective cohort study

Author: A La Marca
A Pacchiarotti
A Sugita
A Takashima
AL Durlinger
C Chapron
CL Pearce
D de Ziegler
D Tsolakidis
DY Lee
FJ Ruiz-Flores
FR Tehrani
G Nargund
Haihe Wang
HG Celik
Hongzhe Xie
Huihui Pei
I Streuli
JA Stilley
L Benaglia
L Muzii
L Muzii
M Busacca
M Kitajima
M Kuroda
Minghui Chen
P Litta
R Fanchin
S Deb
Shuzhong Yao
SL Kristensen
SM Nelson
WJ Hehenkamp
Yajie Chang
Yuqing Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Deterministic Design for a Compliant Parallel Universal Joint With Constant Rotational Stiffness

Author: Hongzhe Zhao
Jingjun Yu
Yan Xie
Publication venue: 'ASME International'
Publication date
Field of study

Crossref

Sample size and power analysis for sparse signal recovery in genome-wide association studies

Author: Hongzhe Li
Jichun Xie
T Tony Cai
Publication venue
Publication date: 01/01/2011
Field of study

SUMMARY Genome-wide association studies have successfully identified hundreds of novel genetic variants associated with many complex human diseases. However, there is a lack of rigorous work on evaluating the statistical power for identifying these variants. In this paper, we consider sparse signal identification in genome-wide association studies and present two analytical frameworks for detailed analysis of the statistical power for detecting and identifying the disease-associated variants. We present an explicit sample size formula for achieving a given false non-discovery rate while controlling the false discovery rate based on an optimal procedure. Sparse genetic variant recovery is also considered and a boundary condition is established in terms of sparsity and signal strength for almost exact recovery of both disease-associated variants and nondiseaseassociated variants. A data-adaptive procedure is proposed to achieve this bound. The analytical results are illustrated with a genome-wide association study of neuroblastoma

CiteSeerX

Computational efficiency on evaluation datasets.

Author: Hongzhe Liu (2819713)
Peng Yang (296696)
Wei Xie (16543)
Wenfa Li (2819716)
Publication venue
Publication date
Field of study

<p>Computational efficiency on evaluation datasets.</p

FigShare

Overview of our secure framework for regularized logistic regression.

Author: Hongzhe Liu (2819713)
Peng Yang (296696)
Wei Xie (16543)
Wenfa Li (2819716)
Publication venue
Publication date
Field of study

<p>Each institution (possessing private data) locally computes summary statistics from its own data, and submits encrypted aggregates following a strong cryptographic scheme [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0156479#pone.0156479.ref030" target="_blank">30</a>]. The Computation Centers securely aggregate the encryptions and conduct model estimation, from which the model adjustment feedback will be sent back as necessary. This iterative process continues until model convergence.</p

FigShare