3,392 research outputs found

    The Incremental Proximal Method: A Probabilistic Perspective

    Full text link
    In this work, we highlight a connection between the incremental proximal method and stochastic filters. We begin by showing that the proximal operators coincide, and hence can be realized with, Bayes updates. We give the explicit form of the updates for the linear regression problem and show that there is a one-to-one correspondence between the proximal operator of the least-squares regression and the Bayes update when the prior and the likelihood are Gaussian. We then carry out this observation to a general sequential setting: We consider the incremental proximal method, which is an algorithm for large-scale optimization, and show that, for a linear-quadratic cost function, it can naturally be realized by the Kalman filter. We then discuss the implications of this idea for nonlinear optimization problems where proximal operators are in general not realizable. In such settings, we argue that the extended Kalman filter can provide a systematic way for the derivation of practical procedures.Comment: Presented at ICASSP, 15-20 April 201

    Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

    Full text link
    We survey incremental methods for minimizing a sum ∑i=1mfi(x)\sum_{i=1}^mf_i(x) consisting of a large number of convex component functions fif_i. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of fif_i. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and large-scale and distributed optimization

    A probabilistic incremental proximal gradient method

    Full text link
    In this paper, we propose a probabilistic optimization method, named probabilistic incremental proximal gradient (PIPG) method, by developing a probabilistic interpretation of the incremental proximal gradient algorithm. We explicitly model the update rules of the incremental proximal gradient method and develop a systematic approach to propagate the uncertainty of the solution estimate over iterations. The PIPG algorithm takes the form of Bayesian filtering updates for a state-space model constructed by using the cost function. Our framework makes it possible to utilize well-known exact or approximate Bayesian filters, such as Kalman or extended Kalman filters, to solve large-scale regularized optimization problems.Comment: 5 pages, includes an extra numerical experimen

    Sparse Regularization in Marketing and Economics

    Full text link
    Sparse alpha-norm regularization has many data-rich applications in Marketing and Economics. Alpha-norm, in contrast to lasso and ridge regularization, jumps to a sparse solution. This feature is attractive for ultra high-dimensional problems that occur in demand estimation and forecasting. The alpha-norm objective is nonconvex and requires coordinate descent and proximal operators to find the sparse solution. We study a typical marketing demand forecasting problem, grocery store sales for salty snacks, that has many dummy variables as controls. The key predictors of demand include price, equivalized volume, promotion, flavor, scent, and brand effects. By comparing with many commonly used machine learning methods, alpha-norm regularization achieves its goal of providing accurate out-of-sample estimates for the promotion lift effects. Finally, we conclude with directions for future research

    The proximal point method revisited

    Full text link
    In this short survey, I revisit the role of the proximal point method in large scale optimization. I focus on three recent examples: a proximally guided subgradient method for weakly convex stochastic approximation, the prox-linear algorithm for minimizing compositions of convex functions and smooth maps, and Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New

    An Incremental Gradient Method for Large-scale Distributed Nonlinearly Constrained Optimization

    Full text link
    Motivated by applications arising from sensor networks and machine learning, we consider the problem of minimizing a finite sum of nondifferentiable convex functions where each component function is associated with an agent and a hard-to-project constraint set. Among well-known avenues to address finite sum problems is the class of incremental gradient (IG) methods where a single component function is selected at each iteration in a cyclic or randomized manner. When the problem is constrained, the existing IG schemes (including projected IG, proximal IAG, and SAGA) require a projection step onto the feasible set at each iteration. Consequently, the performance of these schemes is afflicted with costly projections when the problem includes: (1) nonlinear constraints, or (2) a large number of linear constraints. Our focus in this paper lies in addressing both of these challenges. We develop an algorithm called averaged iteratively regularized incremental gradient (aIR-IG) that does not involve any hard-to-project computation. Under mild assumptions, we derive non-asymptotic rates of convergence for both suboptimality and infeasibility metrics. Numerically, we show that the proposed scheme outperforms the standard projected IG methods on distributed soft-margin support vector machine problems

    Efficiency of minimizing compositions of convex functions and smooth maps

    Full text link
    We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency O(ε−2)\mathcal{O}(\varepsilon^{-2}), akin to gradient descent for smooth minimization. We show that when the subproblems can only be solved by first-order methods, a simple combination of smoothing, the prox-linear method, and a fast-gradient scheme yields an algorithm with complexity O~(ε−3)\widetilde{\mathcal{O}}(\varepsilon^{-3}). The technique readily extends to minimizing an average of mm composite functions, with complexity O~(m/ε2+m/ε3)\widetilde{\mathcal{O}}(m/\varepsilon^{2}+\sqrt{m}/\varepsilon^{3}) in expectation. We round off the paper with an inertial prox-linear method that automatically accelerates in presence of convexity

    Accelerating Stochastic Composition Optimization

    Full text link
    Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic first-order method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty. We show that the ASC-PG exhibits faster convergence than the best known algorithms, and that it achieves the optimal sample-error complexity in several important special cases. We further demonstrate the application of ASC-PG to reinforcement learning and conduct numerical experiments

    Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions

    Full text link
    Tseng's algorithm finds a zero of the sum of a maximally monotone operator and a monotone continuous operator by evaluating the latter twice per iteration. In this paper, we modify Tseng's algorithm for finding a zero of the sum of three operators, where we add a cocoercive operator to the inclusion. Since the sum of a cocoercive and a monotone-Lipschitz operator is monotone and Lipschitz, we could use Tseng's method for solving this problem, but implementing both operators twice per iteration and without taking into advantage the cocoercivity property of one operator. Instead, in our approach, although the {continuous monotone} operator must still be evaluated twice, we exploit the cocoercivity of one operator by evaluating it only once per iteration. Moreover, when the cocoercive or {continuous-monotone} operators are zero it reduces to Tseng's or forward-backward splittings, respectively, unifying in this way both algorithms. In addition, we provide a {preconditioned} version of the proposed method including non self-adjoint linear operators in the computation of resolvents and the single-valued operators involved. This approach allows us to {also} extend previous variable metric versions of Tseng's and forward-backward methods and simplify their conditions on the underlying metrics. We also exploit the case when non self-adjoint linear operators are triangular by blocks in the primal-dual product space for solving primal-dual composite monotone inclusions, obtaining Gauss-Seidel type algorithms which generalize several primal-dual methods available in the literature. Finally we explore {applications to the obstacle problem, Empirical Risk Minimization, distributed optimization and nonlinear programming and we illustrate the performance of the method via some numerical simulations.Comment: 34 Pages, Title Chang

    Optimization Methods for Large-Scale Machine Learning

    Full text link
    This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations
    • …
    corecore