50,384 research outputs found

    A stochastic proximal alternating method for non-smooth non-convex optimization

    Get PDF
    We introduce SPRING, a novel stochastic proximal alternating linearized minimization algorithm for solving a class of non-smooth and non-convex optimization problems. Large-scale imaging problems are becoming increasingly prevalent due to advances in data acquisition and computational capabilities. Motivated by the success of stochastic optimization methods, we propose a stochastic variant of proximal alternating linearized minimization (PALM) algorithm \cite{bolte2014proximal}. We provide global convergence guarantees, demonstrating that our proposed method with variance-reduced stochastic gradient estimators, such as SAGA \cite{SAGA} and SARAH \cite{sarah}, achieves state-of-the-art oracle complexities. We also demonstrate the efficacy of our algorithm via several numerical examples including sparse non-negative matrix factorization, sparse principal component analysis, and blind image deconvolution.Comment: 28 pages, 11 page appendi

    Uniform exponential convergence of sample average random functions under general sampling with applications in stochastic programming

    Get PDF
    AbstractSample average approximation (SAA) is one of the most popular methods for solving stochastic optimization and equilibrium problems. Research on SAA has been mostly focused on the case when sampling is independent and identically distributed (iid) with exceptions (Dai et al. (2000) [9], Homem-de-Mello (2008) [16]). In this paper we study SAA with general sampling (including iid sampling and non-iid sampling) for solving nonsmooth stochastic optimization problems, stochastic Nash equilibrium problems and stochastic generalized equations. To this end, we first derive the uniform exponential convergence of the sample average of a class of lower semicontinuous random functions and then apply it to a nonsmooth stochastic minimization problem. Exponential convergence of estimators of both optimal solutions and M-stationary points (characterized by Mordukhovich limiting subgradients (Mordukhovich (2006) [23], Rockafellar and Wets (1998) [32])) are established under mild conditions. We also use the unform convergence result to establish the exponential rate of convergence of statistical estimators of a stochastic Nash equilibrium problem and estimators of the solutions to a stochastic generalized equation problem

    Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

    Get PDF
    Majorization-minimization algorithms consist of iteratively minimizing a majorizing surrogate of an objective function. Because of its simplicity and its wide applicability, this principle has been very popular in statistics and in signal processing. In this paper, we intend to make this principle scalable. We introduce a stochastic majorization-minimization scheme which is able to deal with large-scale or possibly infinite data sets. When applied to convex optimization problems under suitable assumptions, we show that it achieves an expected convergence rate of O(1/n)O(1/\sqrt{n}) after nn iterations, and of O(1/n)O(1/n) for strongly convex functions. Equally important, our scheme almost surely converges to stationary points for a large class of non-convex problems. We develop several efficient algorithms based on our framework. First, we propose a new stochastic proximal gradient method, which experimentally matches state-of-the-art solvers for large-scale â„“1\ell_1-logistic regression. Second, we develop an online DC programming algorithm for non-convex sparse estimation. Finally, we demonstrate the effectiveness of our approach for solving large-scale structured matrix factorization problems.Comment: accepted for publication for Neural Information Processing Systems (NIPS) 2013. This is the 9-pages version followed by 16 pages of appendices. The title has changed compared to the first technical repor

    Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

    Full text link
    Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems

    Stochastic Frank-Wolfe for Composite Convex Minimization

    Full text link
    A broad class of convex optimization problems can be formulated as a semidefinite program (SDP), minimization of a convex function over the positive-semidefinite cone subject to some affine constraints. The majority of classical SDP solvers are designed for the deterministic setting where problem data is readily available. In this setting, generalized conditional gradient methods (aka Frank-Wolfe-type methods) provide scalable solutions by leveraging the so-called linear minimization oracle instead of the projection onto the semidefinite cone. Most problems in machine learning and modern engineering applications, however, contain some degree of stochasticity. In this work, we propose the first conditional-gradient-type method for solving stochastic optimization problems under affine constraints. Our method guarantees O(k−1/3)\mathcal{O}(k^{-1/3}) convergence rate in expectation on the objective residual and O(k−5/12)\mathcal{O}(k^{-5/12}) on the feasibility gap

    Accelerated Primal-dual Scheme for a Class of Stochastic Nonconvex-concave Saddle Point Problems

    Full text link
    Stochastic nonconvex-concave min-max saddle point problems appear in many machine learning and control problems including distributionally robust optimization, generative adversarial networks, and adversarial learning. In this paper, we consider a class of nonconvex saddle point problems where the objective function satisfies the Polyak-{\L}ojasiewicz condition with respect to the minimization variable and it is concave with respect to the maximization variable. The existing methods for solving nonconvex-concave saddle point problems often suffer from slow convergence and/or contain multiple loops. Our main contribution lies in proposing a novel single-loop accelerated primal-dual algorithm with new convergence rate results appearing for the first time in the literature, to the best of our knowledge. In particular, in the stochastic regime, we demonstrate a convergence rate of O(ϵ−4)\mathcal O(\epsilon^{-4}) to find an ϵ\epsilon-gap solution which can be improved to O(ϵ−2)\mathcal O(\epsilon^{-2}) in deterministic setting

    A stochastic two-step inertial Bregman proximal alternating linearized minimization algorithm for nonconvex and nonsmooth problems

    Full text link
    In this paper, for solving a broad class of large-scale nonconvex and nonsmooth optimization problems, we propose a stochastic two step inertial Bregman proximal alternating linearized minimization (STiBPALM) algorithm with variance-reduced stochastic gradient estimators. And we show that SAGA and SARAH are variance-reduced gradient estimators. Under expectation conditions with the Kurdyka-Lojasiewicz property and some suitable conditions on the parameters, we obtain that the sequence generated by the proposed algorithm converges to a critical point. And the general convergence rate is also provided. Numerical experiments on sparse nonnegative matrix factorization and blind image-deblurring are presented to demonstrate the performance of the proposed algorithm.Comment: arXiv admin note: text overlap with arXiv:2002.12266 by other author
    • …
    corecore