1,825 research outputs found
Fast ADMM Algorithm for Distributed Optimization with Adaptive Penalty
We propose new methods to speed up convergence of the Alternating Direction
Method of Multipliers (ADMM), a common optimization tool in the context of
large scale and distributed learning. The proposed method accelerates the speed
of convergence by automatically deciding the constraint penalty needed for
parameter consensus in each iteration. In addition, we also propose an
extension of the method that adaptively determines the maximum number of
iterations to update the penalty. We show that this approach effectively leads
to an adaptive, dynamic network topology underlying the distributed
optimization. The utility of the new penalty update schemes is demonstrated on
both synthetic and real data, including a computer vision application of
distributed structure from motion.Comment: 8 pages manuscript, 2 pages appendix, 5 figure
Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation
Many modern computer vision and machine learning applications rely on solving
difficult optimization problems that involve non-differentiable objective
functions and constraints. The alternating direction method of multipliers
(ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
generalization of ADMM that often achieves better performance, but its
efficiency depends strongly on algorithm parameters that must be chosen by an
expert user. We propose an adaptive method that automatically tunes the key
algorithm parameters to achieve optimal performance without user oversight.
Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
(ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
detailed convergence analysis of ARADMM is provided, and numerical results on
several applications demonstrate fast practical convergence.Comment: CVPR 201
Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging
Technological advances have led to a proliferation of structured big data
that have matrix-valued covariates. We are specifically motivated to build
predictive models for multi-subject neuroimaging data based on each subject's
brain imaging scans. This is an ultra-high-dimensional problem that consists of
a matrix of covariates (brain locations by time points) for each subject; few
methods currently exist to fit supervised models directly to this tensor data.
We propose a novel modeling and algorithmic strategy to apply generalized
linear models (GLMs) to this massive tensor data in which one set of variables
is associated with locations. Our method begins by fitting GLMs to each
location separately, and then builds an ensemble by blending information across
locations through regularization with what we term an aggregating penalty. Our
so called, Local-Aggregate Model, can be fit in a completely distributed manner
over the locations using an Alternating Direction Method of Multipliers (ADMM)
strategy, and thus greatly reduces the computational burden. Furthermore, we
propose to select the appropriate model through a novel sequence of faster
algorithmic solutions that is similar to regularization paths. We will
demonstrate both the computational and predictive modeling advantages of our
methods via simulations and an EEG classification problem.Comment: 41 pages, 5 figures and 3 table
FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging
The total variation (TV) penalty, as many other analysis-sparsity problems,
does not lead to separable factors or a proximal operatorwith a closed-form
expression, such as soft thresholding for the penalty. As a result,
in a variational formulation of an inverse problem or statisticallearning
estimation, it leads to challenging non-smooth optimization problemsthat are
often solved with elaborate single-step first-order methods. When thedata-fit
term arises from empirical measurements, as in brain imaging, it isoften very
ill-conditioned and without simple structure. In this situation, in proximal
splitting methods, the computation cost of thegradient step can easily dominate
each iteration. Thus it is beneficialto minimize the number of gradient
steps.We present fAASTA, a variant of FISTA, that relies on an internal solver
forthe TV proximal operator, and refines its tolerance to balance
computationalcost of the gradient and the proximal steps. We give benchmarks
andillustrations on "brain decoding": recovering brain maps from
noisymeasurements to predict observed behavior. The algorithm as well as
theempirical study of convergence speed are valuable for any non-exact
proximaloperator, in particular analysis-sparsity problems
- …