1,397 research outputs found
Super-Linear Convergence of Dual Augmented-Lagrangian Algorithm for Sparsity Regularized Estimation
We analyze the convergence behaviour of a recently proposed algorithm for
regularized estimation called Dual Augmented Lagrangian (DAL). Our analysis is
based on a new interpretation of DAL as a proximal minimization algorithm. We
theoretically show under some conditions that DAL converges super-linearly in a
non-asymptotic and global sense. Due to a special modelling of sparse
estimation problems in the context of machine learning, the assumptions we make
are milder and more natural than those made in conventional analysis of
augmented Lagrangian algorithms. In addition, the new interpretation enables us
to generalize DAL to wide varieties of sparse estimation problems. We
experimentally confirm our analysis in a large scale -regularized
logistic regression problem and extensively compare the efficiency of DAL
algorithm to previously proposed algorithms on both synthetic and benchmark
datasets.Comment: 51 pages, 9 figure
Sampling constrained probability distributions using Spherical Augmentation
Statistical models with constrained probability distributions are abundant in
machine learning. Some examples include regression models with norm constraints
(e.g., Lasso), probit, many copula models, and latent Dirichlet allocation
(LDA). Bayesian inference involving probability distributions confined to
constrained domains could be quite challenging for commonly used sampling
algorithms. In this paper, we propose a novel augmentation technique that
handles a wide range of constraints by mapping the constrained domain to a
sphere in the augmented space. By moving freely on the surface of this sphere,
sampling algorithms handle constraints implicitly and generate proposals that
remain within boundaries when mapped back to the original space. Our proposed
method, called {Spherical Augmentation}, provides a mathematically natural and
computationally efficient framework for sampling from constrained probability
distributions. We show the advantages of our method over state-of-the-art
sampling algorithms, such as exact Hamiltonian Monte Carlo, using several
examples including truncated Gaussian distributions, Bayesian Lasso, Bayesian
bridge regression, reconstruction of quantized stationary Gaussian process, and
LDA for topic modeling.Comment: 41 pages, 13 figure
An Extragradient-Based Alternating Direction Method for Convex Minimization
In this paper, we consider the problem of minimizing the sum of two convex
functions subject to linear linking constraints. The classical alternating
direction type methods usually assume that the two convex functions have
relatively easy proximal mappings. However, many problems arising from
statistics, image processing and other fields have the structure that while one
of the two functions has easy proximal mapping, the other function is smoothly
convex but does not have an easy proximal mapping. Therefore, the classical
alternating direction methods cannot be applied. To deal with the difficulty,
we propose in this paper an alternating direction method based on
extragradients. Under the assumption that the smooth function has a Lipschitz
continuous gradient, we prove that the proposed method returns an
-optimal solution within iterations. We apply the
proposed method to solve a new statistical model called fused logistic
regression. Our numerical experiments show that the proposed method performs
very well when solving the test problems. We also test the performance of the
proposed method through solving the lasso problem arising from statistics and
compare the result with several existing efficient solvers for this problem;
the results are very encouraging indeed
Distributed Basis Pursuit
We propose a distributed algorithm for solving the optimization problem Basis
Pursuit (BP). BP finds the least L1-norm solution of the underdetermined linear
system Ax = b and is used, for example, in compressed sensing for
reconstruction. Our algorithm solves BP on a distributed platform such as a
sensor network, and is designed to minimize the communication between nodes.
The algorithm only requires the network to be connected, has no notion of a
central processing node, and no node has access to the entire matrix A at any
time. We consider two scenarios in which either the columns or the rows of A
are distributed among the compute nodes. Our algorithm, named D-ADMM, is a
decentralized implementation of the alternating direction method of
multipliers. We show through numerical simulation that our algorithm requires
considerably less communications between the nodes than the state-of-the-art
algorithms.Comment: Preprint of the journal version of the paper; IEEE Transactions on
Signal Processing, Vol. 60, Issue 4, April, 201
- …