3,392 research outputs found
The Incremental Proximal Method: A Probabilistic Perspective
In this work, we highlight a connection between the incremental proximal
method and stochastic filters. We begin by showing that the proximal operators
coincide, and hence can be realized with, Bayes updates. We give the explicit
form of the updates for the linear regression problem and show that there is a
one-to-one correspondence between the proximal operator of the least-squares
regression and the Bayes update when the prior and the likelihood are Gaussian.
We then carry out this observation to a general sequential setting: We consider
the incremental proximal method, which is an algorithm for large-scale
optimization, and show that, for a linear-quadratic cost function, it can
naturally be realized by the Kalman filter. We then discuss the implications of
this idea for nonlinear optimization problems where proximal operators are in
general not realizable. In such settings, we argue that the extended Kalman
filter can provide a systematic way for the derivation of practical procedures.Comment: Presented at ICASSP, 15-20 April 201
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey
We survey incremental methods for minimizing a sum
consisting of a large number of convex component functions . Our methods
consist of iterations applied to single components, and have proved very
effective in practice. We introduce a unified algorithmic framework for a
variety of such methods, some involving gradient and subgradient iterations,
which are known, and some involving combinations of subgradient and proximal
methods, which are new and offer greater flexibility in exploiting the special
structure of . We provide an analysis of the convergence and rate of
convergence properties of these methods, including the advantages offered by
randomization in the selection of components. We also survey applications in
inference/machine learning, signal processing, and large-scale and distributed
optimization
A probabilistic incremental proximal gradient method
In this paper, we propose a probabilistic optimization method, named
probabilistic incremental proximal gradient (PIPG) method, by developing a
probabilistic interpretation of the incremental proximal gradient algorithm. We
explicitly model the update rules of the incremental proximal gradient method
and develop a systematic approach to propagate the uncertainty of the solution
estimate over iterations. The PIPG algorithm takes the form of Bayesian
filtering updates for a state-space model constructed by using the cost
function. Our framework makes it possible to utilize well-known exact or
approximate Bayesian filters, such as Kalman or extended Kalman filters, to
solve large-scale regularized optimization problems.Comment: 5 pages, includes an extra numerical experimen
Sparse Regularization in Marketing and Economics
Sparse alpha-norm regularization has many data-rich applications in Marketing
and Economics. Alpha-norm, in contrast to lasso and ridge regularization, jumps
to a sparse solution. This feature is attractive for ultra high-dimensional
problems that occur in demand estimation and forecasting. The alpha-norm
objective is nonconvex and requires coordinate descent and proximal operators
to find the sparse solution. We study a typical marketing demand forecasting
problem, grocery store sales for salty snacks, that has many dummy variables as
controls. The key predictors of demand include price, equivalized volume,
promotion, flavor, scent, and brand effects. By comparing with many commonly
used machine learning methods, alpha-norm regularization achieves its goal of
providing accurate out-of-sample estimates for the promotion lift effects.
Finally, we conclude with directions for future research
The proximal point method revisited
In this short survey, I revisit the role of the proximal point method in
large scale optimization. I focus on three recent examples: a proximally guided
subgradient method for weakly convex stochastic approximation, the prox-linear
algorithm for minimizing compositions of convex functions and smooth maps, and
Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New
An Incremental Gradient Method for Large-scale Distributed Nonlinearly Constrained Optimization
Motivated by applications arising from sensor networks and machine learning,
we consider the problem of minimizing a finite sum of nondifferentiable convex
functions where each component function is associated with an agent and a
hard-to-project constraint set. Among well-known avenues to address finite sum
problems is the class of incremental gradient (IG) methods where a single
component function is selected at each iteration in a cyclic or randomized
manner. When the problem is constrained, the existing IG schemes (including
projected IG, proximal IAG, and SAGA) require a projection step onto the
feasible set at each iteration. Consequently, the performance of these schemes
is afflicted with costly projections when the problem includes: (1) nonlinear
constraints, or (2) a large number of linear constraints. Our focus in this
paper lies in addressing both of these challenges. We develop an algorithm
called averaged iteratively regularized incremental gradient (aIR-IG) that does
not involve any hard-to-project computation. Under mild assumptions, we derive
non-asymptotic rates of convergence for both suboptimality and infeasibility
metrics. Numerically, we show that the proposed scheme outperforms the standard
projected IG methods on distributed soft-margin support vector machine
problems
Efficiency of minimizing compositions of convex functions and smooth maps
We consider global efficiency of algorithms for minimizing a sum of a convex
function and a composition of a Lipschitz convex function with a smooth map.
The basic algorithm we rely on is the prox-linear method, which in each
iteration solves a regularized subproblem formed by linearizing the smooth map.
When the subproblems are solved exactly, the method has efficiency
, akin to gradient descent for smooth
minimization. We show that when the subproblems can only be solved by
first-order methods, a simple combination of smoothing, the prox-linear method,
and a fast-gradient scheme yields an algorithm with complexity
. The technique readily extends to
minimizing an average of composite functions, with complexity
in
expectation. We round off the paper with an inertial prox-linear method that
automatically accelerates in presence of convexity
Accelerating Stochastic Composition Optimization
Consider the stochastic composition optimization problem where the objective
is a composition of two expected-value functions. We propose a new stochastic
first-order method, namely the accelerated stochastic compositional proximal
gradient (ASC-PG) method, which updates based on queries to the sampling oracle
using two different timescales. The ASC-PG is the first proximal gradient
method for the stochastic composition problem that can deal with nonsmooth
regularization penalty. We show that the ASC-PG exhibits faster convergence
than the best known algorithms, and that it achieves the optimal sample-error
complexity in several important special cases. We further demonstrate the
application of ASC-PG to reinforcement learning and conduct numerical
experiments
Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions
Tseng's algorithm finds a zero of the sum of a maximally monotone operator
and a monotone continuous operator by evaluating the latter twice per
iteration. In this paper, we modify Tseng's algorithm for finding a zero of the
sum of three operators, where we add a cocoercive operator to the inclusion.
Since the sum of a cocoercive and a monotone-Lipschitz operator is monotone and
Lipschitz, we could use Tseng's method for solving this problem, but
implementing both operators twice per iteration and without taking into
advantage the cocoercivity property of one operator. Instead, in our approach,
although the {continuous monotone} operator must still be evaluated twice, we
exploit the cocoercivity of one operator by evaluating it only once per
iteration. Moreover, when the cocoercive or {continuous-monotone} operators are
zero it reduces to Tseng's or forward-backward splittings, respectively,
unifying in this way both algorithms. In addition, we provide a
{preconditioned} version of the proposed method including non self-adjoint
linear operators in the computation of resolvents and the single-valued
operators involved. This approach allows us to {also} extend previous variable
metric versions of Tseng's and forward-backward methods and simplify their
conditions on the underlying metrics. We also exploit the case when non
self-adjoint linear operators are triangular by blocks in the primal-dual
product space for solving primal-dual composite monotone inclusions, obtaining
Gauss-Seidel type algorithms which generalize several primal-dual methods
available in the literature. Finally we explore {applications to the obstacle
problem, Empirical Risk Minimization, distributed optimization and nonlinear
programming and we illustrate the performance of the method via some numerical
simulations.Comment: 34 Pages, Title Chang
Optimization Methods for Large-Scale Machine Learning
This paper provides a review and commentary on the past, present, and future
of numerical optimization algorithms in the context of machine learning
applications. Through case studies on text classification and the training of
deep neural networks, we discuss how optimization problems arise in machine
learning and what makes them challenging. A major theme of our study is that
large-scale machine learning represents a distinctive setting in which the
stochastic gradient (SG) method has traditionally played a central role while
conventional gradient-based nonlinear optimization techniques typically falter.
Based on this viewpoint, we present a comprehensive theory of a
straightforward, yet versatile SG algorithm, discuss its practical behavior,
and highlight opportunities for designing algorithms with improved performance.
This leads to a discussion about the next generation of optimization methods
for large-scale machine learning, including an investigation of two main
streams of research on techniques that diminish noise in the stochastic
directions and methods that make use of second-order derivative approximations
- …