25,562 research outputs found
Distributed Coordinate Descent Method for Learning with Big Data
In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method
for solving loss minimization problems with big data. We initially partition
the coordinates (features) and assign each partition to a different node of a
cluster. At every iteration, each node picks a random subset of the coordinates
from those it owns, independently from the other computers, and in parallel
computes and applies updates to the selected coordinates based on a simple
closed-form formula. We give bounds on the number of iterations sufficient to
approximately solve the problem with high probability, and show how it depends
on the data and on the partitioning. We perform numerical experiments with a
LASSO instance described by a 3TB matrix.Comment: 11 two-column pages, 1 algorithm, 4 figures, 4 table
Distributed Linear Regression with Compositional Covariates
With the availability of extraordinarily huge data sets, solving the problems
of distributed statistical methodology and computing for such data sets has
become increasingly crucial in the big data area. In this paper, we focus on
the distributed sparse penalized linear log-contrast model in massive
compositional data. In particular, two distributed optimization techniques
under centralized and decentralized topologies are proposed for solving the two
different constrained convex optimization problems. Both two proposed
algorithms are based on the frameworks of Alternating Direction Method of
Multipliers (ADMM) and Coordinate Descent Method of Multipliers(CDMM, Lin et
al., 2014, Biometrika). It is worth emphasizing that, in the decentralized
topology, we introduce a distributed coordinate-wise descent algorithm based on
Group ADMM(GADMM, Elgabli et al., 2020, Journal of Machine Learning Research)
for obtaining a communication-efficient regularized estimation.
Correspondingly, the convergence theories of the proposed algorithms are
rigorously established under some regularity conditions. Numerical experiments
on both synthetic and real data are conducted to evaluate our proposed
algorithms.Comment: 35 pages,2 figure
Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly
nonseparable), convex one. The latter term is usually employed to enforce
structure in the solution, typically sparsity. The main contribution of this
work is a novel \emph{parallel, hybrid random/deterministic} decomposition
scheme wherein, at each iteration, a subset of (block) variables is updated at
the same time by minimizing local convex approximations of the original
nonconvex function. To tackle with huge-scale problems, the (block) variables
to be updated are chosen according to a \emph{mixed random and deterministic}
procedure, which captures the advantages of both pure deterministic and random
update-based schemes. Almost sure convergence of the proposed scheme is
established. Numerical results show that on huge-scale problems the proposed
hybrid random/deterministic algorithm outperforms both random and deterministic
schemes.Comment: The order of the authors is alphabetica
A Distributed Asynchronous Method of Multipliers for Constrained Nonconvex Optimization
This paper presents a fully asynchronous and distributed approach for
tackling optimization problems in which both the objective function and the
constraints may be nonconvex. In the considered network setting each node is
active upon triggering of a local timer and has access only to a portion of the
objective function and to a subset of the constraints. In the proposed
technique, based on the method of multipliers, each node performs, when it
wakes up, either a descent step on a local augmented Lagrangian or an ascent
step on the local multiplier vector. Nodes realize when to switch from the
descent step to the ascent one through an asynchronous distributed logic-AND,
which detects when all the nodes have reached a predefined tolerance in the
minimization of the augmented Lagrangian. It is shown that the resulting
distributed algorithm is equivalent to a block coordinate descent for the
minimization of the global augmented Lagrangian. This allows one to extend the
properties of the centralized method of multipliers to the considered
distributed framework. Two application examples are presented to validate the
proposed approach: a distributed source localization problem and the parameter
estimation of a neural network.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0648
- …