Search CORE

2,278 research outputs found

Supporting Regularized Logistic Regression Privately and Efficiently

Author: Li Wenfa
Liu Hongzhe
Xie Wei
Yang Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/09/2015
Field of study

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

Author: Forte Simone
Jaggi Martin
Jordan Michael I.
Smith Virginia
Publication venue
Publication date: 02/06/2016
Field of study

Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives. In this paper, we present a communication-efficient framework for L1-regularized optimization in the distributed environment. By viewing classical objectives in a more general primal-dual setting, we develop a new class of methods that can be efficiently distributed and applied to common sparsity-inducing models, such as Lasso, sparse logistic regression, and elastic net-regularized problems. We provide theoretical convergence guarantees for our framework, and demonstrate its efficiency and flexibility with a thorough experimental comparison on Amazon EC2. Our proposed framework yields speedups of up to 50x as compared to current state-of-the-art methods for distributed L1-regularized optimization

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

Author: Forte Simone
Jaggi Martin
Jordan Michael I.
Ma Chenxin
Smith Virginia
Takac Martin
Publication venue
Publication date: 21/06/2017
Field of study

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data