Search CORE

1,962 research outputs found

Scaled stochastic gradient descent for low-rank matrix completion

Author: Mishra Bamdev
Sepulchre Rodolphe
Publication venue
Publication date: 05/10/2016
Field of study

The paper looks at a scaled variant of the stochastic gradient descent algorithm for the matrix completion problem. Specifically, we propose a novel matrix-scaling of the partial derivatives that acts as an efficient preconditioning for the standard stochastic gradient descent algorithm. This proposed matrix-scaling provides a trade-off between local and global second order information. It also resolves the issue of scale invariance that exists in matrix factorization models. The overall computational complexity is linear with the number of known entries, thereby extending to a large-scale setup. Numerical comparisons show that the proposed algorithm competes favorably with state-of-the-art algorithms on various different benchmarks.Comment: Accepted to IEEE CDC 201

arXiv.org e-Print Archive

A Riemannian gossip approach to decentralized matrix completion

Author: Kasai Hiroyuki
Mishra Bamdev
Saroop Atul
Publication venue
Publication date: 23/05/2016
Field of study

In this paper, we propose novel gossip algorithms for the low-rank decentralized matrix completion problem. The proposed approach is on the Riemannian Grassmann manifold that allows local matrix completion by different agents while achieving asymptotic consensus on the global low-rank factors. The resulting approach is scalable and parallelizable. Our numerical experiments show the good performance of the proposed algorithms on various benchmarks.Comment: Under revie

arXiv.org e-Print Archive

Stochastic Proximal Gradient Descent for Nuclear Norm Regularization

Author: Jin Rong
Yang Tianbao
Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 05/12/2015
Field of study

In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size

m \times n

. By constructing a low-rank estimate of the gradient, we propose an iterative algorithm based on stochastic proximal gradient descent (SPGD), and take the last iterate of SPGD as the final solution. The main advantage of the proposed algorithm is that its space complexity is

O(m+n)

, in contrast, most of previous algorithms have a

O(mn)

space complexity. Theoretical analysis shows that it achieves

O(\log T/\sqrt{T})

and

O(\log T/T)

convergence rates for general convex functions and strongly convex functions, respectively

arXiv.org e-Print Archive

Convex Optimization without Projection Steps

Author: Jaggi Martin
Publication venue
Publication date: 27/12/2011
Field of study

For the general problem of minimizing a convex function over a compact convex domain, we will investigate a simple iterative approximation algorithm based on the method by Frank & Wolfe 1956, that does not need projection steps in order to stay inside the optimization domain. Instead of a projection step, the linearized problem defined by a current subgradient is solved, which gives a step direction that will naturally stay in the domain. Our framework generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to arbitrary convex domains. We give a convergence proof guaranteeing {\epsilon}-small duality gap after O(1/{\epsilon}) iterations. The method allows us to understand the sparsity of approximate solutions for any l1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality. We obtain matching upper and lower bounds of {\Theta}(1/{\epsilon}) for the sparsity for l1-problems. The same bounds apply to low-rank semidefinite optimization with bounded trace, showing that rank O(1/{\epsilon}) is best possible here as well. As another application, we obtain sparse matrices of O(1/{\epsilon}) non-zero entries as {\epsilon}-approximate solutions when optimizing any convex function over a class of diagonally dominant symmetric matrices. We show that our proposed first-order method also applies to nuclear norm and max-norm matrix optimization problems. For nuclear norm regularized optimization, such as matrix completion and low-rank recovery, we demonstrate the practical efficiency and scalability of our algorithm for large matrix problems, as e.g. the Netflix dataset. For general convex optimization over bounded matrix max-norm, our algorithm is the first with a convergence guarantee, to the best of our knowledge

arXiv.org e-Print Archive

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Author: Chi Yuejie
Ma Cong
Tong Tian
Publication venue
Publication date: 28/09/2020
Field of study

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and then optimize these factors directly via simple iterative methods such as gradient descent and alternating minimization. Despite nonconvexity, recent literatures have shown that these simple heuristics in fact achieve linear convergence when initialized properly for a growing number of problems of interest. However, upon closer examination, existing approaches can still be computationally expensive especially for ill-conditioned matrices: the convergence rate of gradient descent depends linearly on the condition number of the low-rank matrix, while the per-iteration cost of alternating minimization is often prohibitive for large matrices. The goal of this paper is to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent (ScaledGD) which can be viewed as pre-conditioned or diagonally-scaled gradient descent, where the pre-conditioners are adaptive and iteration-varying with a minimal computational overhead. With tailored variants for low-rank matrix sensing, robust principal component analysis and matrix completion, we theoretically show that ScaledGD achieves the best of both worlds: it converges linearly at a rate independent of the condition number of the low-rank matrix similar as alternating minimization, while maintaining the low per-iteration cost of gradient descent. Our analysis is also applicable to general loss functions that are restricted strongly convex and smooth over low-rank matrices. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties over a wide range of low-rank matrix estimation tasks

arXiv.org e-Print Archive

Symmetry-invariant optimization in deep networks

Author: Badrinarayanan Vijay
Cipolla Roberto
Mishra Bamdev
Publication venue
Publication date: 07/11/2015
Field of study

Recent works have highlighted scale invariance or symmetry that is present in the weight space of a typical deep network and the adverse effect that it has on the Euclidean gradient based stochastic gradient descent optimization. In this work, we show that these and other commonly used deep networks, such as those which use a max-pooling and sub-sampling layer, possess more complex forms of symmetry arising from scaling based reparameterization of the network weights. We then propose two symmetry-invariant gradient based weight updates for stochastic gradient descent based learning. Our empirical evidence based on the MNIST dataset shows that these updates improve the test performance without sacrificing the computational efficiency of the weight updates. We also show the results of training with one of the proposed weight updates on an image segmentation problem.Comment: Submitted to ICLR 2016. arXiv admin note: text overlap with arXiv:1511.0102

arXiv.org e-Print Archive

Identifying global optimality for dictionary learning

Author: Le Lei
White Martha
Publication venue
Publication date: 06/08/2017
Field of study

Learning new representations of input observations in machine learning is often tackled using a factorization of the data. For many such problems, including sparse coding and matrix completion, learning these factorizations can be difficult, in terms of efficiency and to guarantee that the solution is a global minimum. Recently, a general class of objectives have been introduced-which we term induced dictionary learning models (DLMs)-that have an induced convex form that enables global optimization. Though attractive theoretically, this induced form is impractical, particularly for large or growing datasets. In this work, we investigate the use of practical alternating minimization algorithms for induced DLMs, that ensure convergence to global optima. We characterize the stationary points of these models, and, using these insights, highlight practical choices for the objectives. We then provide theoretical and empirical evidence that alternating minimization, from a random initialization, converges to global minima for a large subclass of induced DLMs. In particular, we take advantage of the existence of the (potentially unknown) convex induced form, to identify when stationary points are global minima for the dictionary learning objective. We then provide an empirical investigation into practical optimization choices for using alternating minimization for induced DLMs, for both batch and stochastic gradient descent.Comment: Updates to previous version include a small modification to Proposition 2, to only use normed regularizers, and a modification to the main theorem (previously Theorem 13) to focus on the overcomplete, full rank setting and to better characterize non-differentiable induced regularizers. The theory has been significantly modified since version

arXiv.org e-Print Archive

Low-Rank Modeling and Its Applications in Image Analysis

Author: Yang Can
Yu Weichuan
Zhao Hongyu
Zhou Xiaowei
Publication venue
Publication date: 22/10/2014
Field of study

Low-rank modeling generally refers to a class of methods that solve problems by representing variables of interest as low-rank matrices. It has achieved great success in various fields including computer vision, data mining, signal processing and bioinformatics. Recently, much progress has been made in theories, algorithms and applications of low-rank modeling, such as exact low-rank matrix recovery via convex programming and matrix completion applied to collaborative filtering. These advances have brought more and more attentions to this topic. In this paper, we review the recent advance of low-rank modeling, the state-of-the-art algorithms, and related applications in image analysis. We first give an overview to the concept of low-rank modeling and challenging problems in this area. Then, we summarize the models and algorithms for low-rank matrix recovery and illustrate their advantages and limitations with numerical experiments. Next, we introduce a few applications of low-rank modeling in the context of image analysis. Finally, we conclude this paper with some discussions.Comment: To appear in ACM Computing Survey

arXiv.org e-Print Archive

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Author: Chen Yuxin
Chi Yuejie
Lu Yue M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2019
Field of study

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.Comment: Invited overview articl

arXiv.org e-Print Archive

Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation

Author: Chen Yudong
Chi Yuejie
Publication venue
Publication date: 02/05/2018
Field of study

Low-rank modeling plays a pivotal role in signal processing and machine learning, with applications ranging from collaborative filtering, video surveillance, medical imaging, to dimensionality reduction and adaptive filtering. Many modern high-dimensional data and interactions thereof can be modeled as lying approximately in a low-dimensional subspace or manifold, possibly with additional structures, and its proper exploitations lead to significant reduction of costs in sensing, computation and storage. In recent years, there is a plethora of progress in understanding how to exploit low-rank structures using computationally efficient procedures in a provable manner, including both convex and nonconvex approaches. On one side, convex relaxations such as nuclear norm minimization often lead to statistically optimal procedures for estimating low-rank matrices, where first-order methods are developed to address the computational challenges; on the other side, there is emerging evidence that properly designed nonconvex procedures, such as projected gradient descent, often provide globally optimal solutions with a much lower computational cost in many problems. This survey article will provide a unified overview of these recent advances on low-rank matrix estimation from incomplete measurements. Attention is paid to rigorous characterization of the performance of these algorithms, and to problems where the low-rank matrix have additional structural properties that require new algorithmic designs and theoretical analysis.Comment: To appear in IEEE Signal Processing Magazin

arXiv.org e-Print Archive