13,548 research outputs found
An Aggregation Method for Sparse Logistic Regression
regularized logistic regression has now become a workhorse of data
mining and bioinformatics: it is widely used for many classification problems,
particularly ones with many features. However, regularization typically
selects too many features and that so-called false positives are unavoidable.
In this paper, we demonstrate and analyze an aggregation method for sparse
logistic regression in high dimensions. This approach linearly combines the
estimators from a suitable set of logistic models with different underlying
sparsity patterns and can balance the predictive ability and model
interpretability. Numerical performance of our proposed aggregation method is
then investigated using simulation studies. We also analyze a published
genome-wide case-control dataset to further evaluate the usefulness of the
aggregation method in multilocus association mapping
L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework
Despite the importance of sparsity in many large-scale applications, there
are few methods for distributed optimization of sparsity-inducing objectives.
In this paper, we present a communication-efficient framework for
L1-regularized optimization in the distributed environment. By viewing
classical objectives in a more general primal-dual setting, we develop a new
class of methods that can be efficiently distributed and applied to common
sparsity-inducing models, such as Lasso, sparse logistic regression, and
elastic net-regularized problems. We provide theoretical convergence guarantees
for our framework, and demonstrate its efficiency and flexibility with a
thorough experimental comparison on Amazon EC2. Our proposed framework yields
speedups of up to 50x as compared to current state-of-the-art methods for
distributed L1-regularized optimization
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
- …