1,962 research outputs found
Scaled stochastic gradient descent for low-rank matrix completion
The paper looks at a scaled variant of the stochastic gradient descent
algorithm for the matrix completion problem. Specifically, we propose a novel
matrix-scaling of the partial derivatives that acts as an efficient
preconditioning for the standard stochastic gradient descent algorithm. This
proposed matrix-scaling provides a trade-off between local and global second
order information. It also resolves the issue of scale invariance that exists
in matrix factorization models. The overall computational complexity is linear
with the number of known entries, thereby extending to a large-scale setup.
Numerical comparisons show that the proposed algorithm competes favorably with
state-of-the-art algorithms on various different benchmarks.Comment: Accepted to IEEE CDC 201
A Riemannian gossip approach to decentralized matrix completion
In this paper, we propose novel gossip algorithms for the low-rank
decentralized matrix completion problem. The proposed approach is on the
Riemannian Grassmann manifold that allows local matrix completion by different
agents while achieving asymptotic consensus on the global low-rank factors. The
resulting approach is scalable and parallelizable. Our numerical experiments
show the good performance of the proposed algorithms on various benchmarks.Comment: Under revie
Stochastic Proximal Gradient Descent for Nuclear Norm Regularization
In this paper, we utilize stochastic optimization to reduce the space
complexity of convex composite optimization with a nuclear norm regularizer,
where the variable is a matrix of size . By constructing a low-rank
estimate of the gradient, we propose an iterative algorithm based on stochastic
proximal gradient descent (SPGD), and take the last iterate of SPGD as the
final solution. The main advantage of the proposed algorithm is that its space
complexity is , in contrast, most of previous algorithms have a
space complexity. Theoretical analysis shows that it achieves and convergence rates for general convex functions
and strongly convex functions, respectively
Convex Optimization without Projection Steps
For the general problem of minimizing a convex function over a compact convex
domain, we will investigate a simple iterative approximation algorithm based on
the method by Frank & Wolfe 1956, that does not need projection steps in order
to stay inside the optimization domain. Instead of a projection step, the
linearized problem defined by a current subgradient is solved, which gives a
step direction that will naturally stay in the domain. Our framework
generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual
analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to
arbitrary convex domains. We give a convergence proof guaranteeing
{\epsilon}-small duality gap after O(1/{\epsilon}) iterations.
The method allows us to understand the sparsity of approximate solutions for
any l1-regularized convex optimization problem (and for optimization over the
simplex), expressed as a function of the approximation quality. We obtain
matching upper and lower bounds of {\Theta}(1/{\epsilon}) for the sparsity for
l1-problems. The same bounds apply to low-rank semidefinite optimization with
bounded trace, showing that rank O(1/{\epsilon}) is best possible here as well.
As another application, we obtain sparse matrices of O(1/{\epsilon}) non-zero
entries as {\epsilon}-approximate solutions when optimizing any convex function
over a class of diagonally dominant symmetric matrices.
We show that our proposed first-order method also applies to nuclear norm and
max-norm matrix optimization problems. For nuclear norm regularized
optimization, such as matrix completion and low-rank recovery, we demonstrate
the practical efficiency and scalability of our algorithm for large matrix
problems, as e.g. the Netflix dataset. For general convex optimization over
bounded matrix max-norm, our algorithm is the first with a convergence
guarantee, to the best of our knowledge
Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent
Low-rank matrix estimation is a canonical problem that finds numerous
applications in signal processing, machine learning and imaging science. A
popular approach in practice is to factorize the matrix into two compact
low-rank factors, and then optimize these factors directly via simple iterative
methods such as gradient descent and alternating minimization. Despite
nonconvexity, recent literatures have shown that these simple heuristics in
fact achieve linear convergence when initialized properly for a growing number
of problems of interest. However, upon closer examination, existing approaches
can still be computationally expensive especially for ill-conditioned matrices:
the convergence rate of gradient descent depends linearly on the condition
number of the low-rank matrix, while the per-iteration cost of alternating
minimization is often prohibitive for large matrices. The goal of this paper is
to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent
(ScaledGD) which can be viewed as pre-conditioned or diagonally-scaled gradient
descent, where the pre-conditioners are adaptive and iteration-varying with a
minimal computational overhead. With tailored variants for low-rank matrix
sensing, robust principal component analysis and matrix completion, we
theoretically show that ScaledGD achieves the best of both worlds: it converges
linearly at a rate independent of the condition number of the low-rank matrix
similar as alternating minimization, while maintaining the low per-iteration
cost of gradient descent. Our analysis is also applicable to general loss
functions that are restricted strongly convex and smooth over low-rank
matrices. To the best of our knowledge, ScaledGD is the first algorithm that
provably has such properties over a wide range of low-rank matrix estimation
tasks
Symmetry-invariant optimization in deep networks
Recent works have highlighted scale invariance or symmetry that is present in
the weight space of a typical deep network and the adverse effect that it has
on the Euclidean gradient based stochastic gradient descent optimization. In
this work, we show that these and other commonly used deep networks, such as
those which use a max-pooling and sub-sampling layer, possess more complex
forms of symmetry arising from scaling based reparameterization of the network
weights. We then propose two symmetry-invariant gradient based weight updates
for stochastic gradient descent based learning. Our empirical evidence based on
the MNIST dataset shows that these updates improve the test performance without
sacrificing the computational efficiency of the weight updates. We also show
the results of training with one of the proposed weight updates on an image
segmentation problem.Comment: Submitted to ICLR 2016. arXiv admin note: text overlap with
arXiv:1511.0102
Identifying global optimality for dictionary learning
Learning new representations of input observations in machine learning is
often tackled using a factorization of the data. For many such problems,
including sparse coding and matrix completion, learning these factorizations
can be difficult, in terms of efficiency and to guarantee that the solution is
a global minimum. Recently, a general class of objectives have been
introduced-which we term induced dictionary learning models (DLMs)-that have an
induced convex form that enables global optimization. Though attractive
theoretically, this induced form is impractical, particularly for large or
growing datasets. In this work, we investigate the use of practical alternating
minimization algorithms for induced DLMs, that ensure convergence to global
optima. We characterize the stationary points of these models, and, using these
insights, highlight practical choices for the objectives. We then provide
theoretical and empirical evidence that alternating minimization, from a random
initialization, converges to global minima for a large subclass of induced
DLMs. In particular, we take advantage of the existence of the (potentially
unknown) convex induced form, to identify when stationary points are global
minima for the dictionary learning objective. We then provide an empirical
investigation into practical optimization choices for using alternating
minimization for induced DLMs, for both batch and stochastic gradient descent.Comment: Updates to previous version include a small modification to
Proposition 2, to only use normed regularizers, and a modification to the
main theorem (previously Theorem 13) to focus on the overcomplete, full rank
setting and to better characterize non-differentiable induced regularizers.
The theory has been significantly modified since version
Low-Rank Modeling and Its Applications in Image Analysis
Low-rank modeling generally refers to a class of methods that solve problems
by representing variables of interest as low-rank matrices. It has achieved
great success in various fields including computer vision, data mining, signal
processing and bioinformatics. Recently, much progress has been made in
theories, algorithms and applications of low-rank modeling, such as exact
low-rank matrix recovery via convex programming and matrix completion applied
to collaborative filtering. These advances have brought more and more
attentions to this topic. In this paper, we review the recent advance of
low-rank modeling, the state-of-the-art algorithms, and related applications in
image analysis. We first give an overview to the concept of low-rank modeling
and challenging problems in this area. Then, we summarize the models and
algorithms for low-rank matrix recovery and illustrate their advantages and
limitations with numerical experiments. Next, we introduce a few applications
of low-rank modeling in the context of image analysis. Finally, we conclude
this paper with some discussions.Comment: To appear in ACM Computing Survey
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation
Low-rank modeling plays a pivotal role in signal processing and machine
learning, with applications ranging from collaborative filtering, video
surveillance, medical imaging, to dimensionality reduction and adaptive
filtering. Many modern high-dimensional data and interactions thereof can be
modeled as lying approximately in a low-dimensional subspace or manifold,
possibly with additional structures, and its proper exploitations lead to
significant reduction of costs in sensing, computation and storage. In recent
years, there is a plethora of progress in understanding how to exploit low-rank
structures using computationally efficient procedures in a provable manner,
including both convex and nonconvex approaches. On one side, convex relaxations
such as nuclear norm minimization often lead to statistically optimal
procedures for estimating low-rank matrices, where first-order methods are
developed to address the computational challenges; on the other side, there is
emerging evidence that properly designed nonconvex procedures, such as
projected gradient descent, often provide globally optimal solutions with a
much lower computational cost in many problems. This survey article will
provide a unified overview of these recent advances on low-rank matrix
estimation from incomplete measurements. Attention is paid to rigorous
characterization of the performance of these algorithms, and to problems where
the low-rank matrix have additional structural properties that require new
algorithmic designs and theoretical analysis.Comment: To appear in IEEE Signal Processing Magazin
- …