Search CORE

3,392 research outputs found

The Incremental Proximal Method: A Probabilistic Perspective

Author: Akyildiz Ömer Deniz
Elvira Victor
Miguez Joaquin
Publication venue
Publication date: 12/07/2018
Field of study

In this work, we highlight a connection between the incremental proximal method and stochastic filters. We begin by showing that the proximal operators coincide, and hence can be realized with, Bayes updates. We give the explicit form of the updates for the linear regression problem and show that there is a one-to-one correspondence between the proximal operator of the least-squares regression and the Bayes update when the prior and the likelihood are Gaussian. We then carry out this observation to a general sequential setting: We consider the incremental proximal method, which is an algorithm for large-scale optimization, and show that, for a linear-quadratic cost function, it can naturally be realized by the Kalman filter. We then discuss the implications of this idea for nonlinear optimization problems where proximal operators are in general not realizable. In such settings, we argue that the extended Kalman filter can provide a systematic way for the derivation of practical procedures.Comment: Presented at ICASSP, 15-20 April 201

arXiv.org e-Print Archive

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

Author: Bertsekas Dimitri P.
Publication venue
Publication date: 19/12/2017
Field of study

We survey incremental methods for minimizing a sum

\sum_{i=1}^mf_i(x)

consisting of a large number of convex component functions

f_i

. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of

f_i

. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and large-scale and distributed optimization

arXiv.org e-Print Archive

A probabilistic incremental proximal gradient method

Author: Akyildiz Ömer Deniz
Chouzenoux Émilie
Elvira Víctor
Míguez Joaquín
Publication venue
Publication date: 19/06/2019
Field of study

In this paper, we propose a probabilistic optimization method, named probabilistic incremental proximal gradient (PIPG) method, by developing a probabilistic interpretation of the incremental proximal gradient algorithm. We explicitly model the update rules of the incremental proximal gradient method and develop a systematic approach to propagate the uncertainty of the solution estimate over iterations. The PIPG algorithm takes the form of Bayesian filtering updates for a state-space model constructed by using the cost function. Our framework makes it possible to utilize well-known exact or approximate Bayesian filters, such as Kalman or extended Kalman filters, to solve large-scale regularized optimization problems.Comment: 5 pages, includes an extra numerical experimen

arXiv.org e-Print Archive

Sparse Regularization in Marketing and Economics

Author: Feng Guanhao
Polson Nicholas
Wang Yuexi
Xu Jianeng
Publication venue
Publication date: 05/02/2018
Field of study

Sparse alpha-norm regularization has many data-rich applications in Marketing and Economics. Alpha-norm, in contrast to lasso and ridge regularization, jumps to a sparse solution. This feature is attractive for ultra high-dimensional problems that occur in demand estimation and forecasting. The alpha-norm objective is nonconvex and requires coordinate descent and proximal operators to find the sparse solution. We study a typical marketing demand forecasting problem, grocery store sales for salty snacks, that has many dummy variables as controls. The key predictors of demand include price, equivalized volume, promotion, flavor, scent, and brand effects. By comparing with many commonly used machine learning methods, alpha-norm regularization achieves its goal of providing accurate out-of-sample estimates for the promotion lift effects. Finally, we conclude with directions for future research

arXiv.org e-Print Archive

The proximal point method revisited

Author: Drusvyatskiy Dmitriy
Publication venue
Publication date: 16/12/2017
Field of study

In this short survey, I revisit the role of the proximal point method in large scale optimization. I focus on three recent examples: a proximally guided subgradient method for weakly convex stochastic approximation, the prox-linear algorithm for minimizing compositions of convex functions and smooth maps, and Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New

arXiv.org e-Print Archive

An Incremental Gradient Method for Large-scale Distributed Nonlinearly Constrained Optimization

Author: Kaushik Harshal D.
Yousefian Farzad
Publication venue
Publication date: 18/03/2021
Field of study

Motivated by applications arising from sensor networks and machine learning, we consider the problem of minimizing a finite sum of nondifferentiable convex functions where each component function is associated with an agent and a hard-to-project constraint set. Among well-known avenues to address finite sum problems is the class of incremental gradient (IG) methods where a single component function is selected at each iteration in a cyclic or randomized manner. When the problem is constrained, the existing IG schemes (including projected IG, proximal IAG, and SAGA) require a projection step onto the feasible set at each iteration. Consequently, the performance of these schemes is afflicted with costly projections when the problem includes: (1) nonlinear constraints, or (2) a large number of linear constraints. Our focus in this paper lies in addressing both of these challenges. We develop an algorithm called averaged iteratively regularized incremental gradient (aIR-IG) that does not involve any hard-to-project computation. Under mild assumptions, we derive non-asymptotic rates of convergence for both suboptimality and infeasibility metrics. Numerically, we show that the proposed scheme outperforms the standard projected IG methods on distributed soft-margin support vector machine problems

arXiv.org e-Print Archive

Efficiency of minimizing compositions of convex functions and smooth maps

Author: Drusvyatskiy Dmitriy
Paquette Courtney
Publication venue
Publication date: 14/08/2017
Field of study

We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency

\mathcal{O}(\varepsilon^{-2})

, akin to gradient descent for smooth minimization. We show that when the subproblems can only be solved by first-order methods, a simple combination of smoothing, the prox-linear method, and a fast-gradient scheme yields an algorithm with complexity

\widetilde{\mathcal{O}}(\varepsilon^{-3})

. The technique readily extends to minimizing an average of

m

composite functions, with complexity

\widetilde{\mathcal{O}}(m/\varepsilon^{2}+\sqrt{m}/\varepsilon^{3})

in expectation. We round off the paper with an inertial prox-linear method that automatically accelerates in presence of convexity

arXiv.org e-Print Archive

Accelerating Stochastic Composition Optimization

Author: Fang Ethan X.
Liu Ji
Wang Mengdi
Publication venue
Publication date: 25/07/2016
Field of study

Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic first-order method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty. We show that the ASC-PG exhibits faster convergence than the best known algorithms, and that it achieves the optimal sample-error complexity in several important special cases. We further demonstrate the application of ASC-PG to reinforcement learning and conduct numerical experiments

arXiv.org e-Print Archive

Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions

Author: Briceño-Arias Luis M.
Davis Damek
Publication venue
Publication date: 23/03/2018
Field of study

Tseng's algorithm finds a zero of the sum of a maximally monotone operator and a monotone continuous operator by evaluating the latter twice per iteration. In this paper, we modify Tseng's algorithm for finding a zero of the sum of three operators, where we add a cocoercive operator to the inclusion. Since the sum of a cocoercive and a monotone-Lipschitz operator is monotone and Lipschitz, we could use Tseng's method for solving this problem, but implementing both operators twice per iteration and without taking into advantage the cocoercivity property of one operator. Instead, in our approach, although the {continuous monotone} operator must still be evaluated twice, we exploit the cocoercivity of one operator by evaluating it only once per iteration. Moreover, when the cocoercive or {continuous-monotone} operators are zero it reduces to Tseng's or forward-backward splittings, respectively, unifying in this way both algorithms. In addition, we provide a {preconditioned} version of the proposed method including non self-adjoint linear operators in the computation of resolvents and the single-valued operators involved. This approach allows us to {also} extend previous variable metric versions of Tseng's and forward-backward methods and simplify their conditions on the underlying metrics. We also exploit the case when non self-adjoint linear operators are triangular by blocks in the primal-dual product space for solving primal-dual composite monotone inclusions, obtaining Gauss-Seidel type algorithms which generalize several primal-dual methods available in the literature. Finally we explore {applications to the obstacle problem, Empirical Risk Minimization, distributed optimization and nonlinear programming and we illustrate the performance of the method via some numerical simulations.Comment: 34 Pages, Title Chang

arXiv.org e-Print Archive

Optimization Methods for Large-Scale Machine Learning

Author: Bottou Léon
Curtis Frank E.
Nocedal Jorge
Publication venue
Publication date: 08/02/2018
Field of study

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations

arXiv.org e-Print Archive