4,240 research outputs found

    Catalyst Acceleration for Gradient-Based Non-Convex Optimization

    Get PDF
    We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and signal processing. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. These properties are achieved without assuming any knowledge about the convexity of the objective, by automatically adapting to the unknown weak convexity constant. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks

    Diffusion Approximations for Online Principal Component Estimation and Global Convergence

    Full text link
    In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis. Oja's iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja's iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Oja's iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples.Comment: Appeared in NIPS 201

    Distributed Big-Data Optimization via Block-Iterative Convexification and Averaging

    Full text link
    In this paper, we study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is in big-data problems wherein there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable, due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method whereby at each iteration agents optimize and then communicate (in an uncoordinated fashion) only a subset of their decision variables. To deal with non-convexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate gradient averages; and ii) a novel block-wise consensus-based protocol to perform local block-averaging operations and gradient tacking. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed

    Block stochastic gradient iteration for convex and nonconvex optimization

    Full text link
    The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are easier to update individually than together, BCD has a lower per-iteration cost. This paper introduces a method that combines the features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. Specifically, a block stochastic gradient (BSG) method is proposed for solving both convex and nonconvex programs. At each iteration, BSG approximates the gradient of the differentiable part of the objective by randomly sampling a small set of data or sampling a few functions from the sum term in the objective, and then, using those samples, it updates all the blocks of variables in either a deterministic or a randomly shuffled order. Its convergence for both convex and nonconvex cases are established in different senses. In the convex case, the proposed method has the same order of convergence rate as the SG method. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. The proposed method was numerically tested on problems including stochastic least squares and logistic regression, which are convex, as well as low-rank tensor recovery and bilinear logistic regression, which are nonconvex
    • …
    corecore