Search CORE

49,212 research outputs found

Complete Dictionary Recovery over the Sphere

Author: Qu Qing
Sun Ju
Wright John
Publication venue
Publication date: 17/11/2015
Field of study

We consider the problem of recovering a complete (i.e., square and invertible) matrix

\mathbf A_0

, from

\mathbf Y \in \mathbb R^{n \times p}

with

\mathbf Y = \mathbf A_0 \mathbf X_0

, provided

\mathbf X_0

is sufficiently sparse. This recovery problem is central to the theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals, and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers

\mathbf A_0

when

\mathbf X_0

has

O(n)

nonzeros per column, under suitable probability model for

\mathbf X_0

. In contrast, prior results based on efficient algorithms provide recovery guarantees when

\mathbf X_0

has only

O(n^{1-\delta})

nonzeros per column for any constant

\delta \in (0, 1)

. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. To show this apparently hard problem is tractable, we first provide a geometric characterization of the high-dimensional objective landscape, which shows that with high probability there are no "spurious" local minima. This particular geometric structure allows us to design a Riemannian trust region algorithm over the sphere that provably converges to one local minimizer with an arbitrary initialization, despite the presence of saddle points. The geometric approach we develop here may also shed light on other problems arising from nonconvex recovery of structured signals.Comment: 104 pages, 5 figures. Due to length constraint of publication, this long paper are subsequently divided into two papers (arXiv:1511.03607 and arXiv:1511.04777). Further updates will be made only to the two paper

arXiv.org e-Print Archive

Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region Method

Author: Qu Qing
Sun Ju
Wright John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2016
Field of study

We consider the problem of recovering a complete (i.e., square and invertible) matrix

\mathbf A_0

, from

\mathbf Y \in \mathbb{R}^{n \times p}

with

\mathbf Y = \mathbf A_0 \mathbf X_0

, provided

\mathbf X_0

is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers

\mathbf A_0

when

\mathbf X_0

has

O(n)

nonzeros per column, under suitable probability model for

\mathbf X_0

. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. In a companion paper (arXiv:1511.03607), we have showed that with high probability our nonconvex formulation has no "spurious" local minimizers and around any saddle point the objective function has a negative directional curvature. In this paper, we take advantage of the particular geometric structure, and describe a Riemannian trust region algorithm that provably converges to a local minimizer with from arbitrary initializations. Such minimizers give excellent approximations to rows of

\mathbf X_0

. The rows are then recovered by linear programming rounding and deflation.Comment: The second of two papers based on the report arXiv:1504.06785. Accepted by IEEE Transaction on Information Theory; revised according to the reviewers' comment

arXiv.org e-Print Archive

$\mathbf{M^*}$ -Regularized Dictionary Learning

Author: Barré Mathieu
d'Aspremont Alexandre
Publication venue
Publication date: 05/10/2018
Field of study

Classical dictionary learning methods simply normalize dictionary columns at each iteration, and the impact of this basic form of regularization on generalization performance (e.g. compression ratio on new images) is unclear. Here, we derive a tractable performance measure for dictionaries in compressed sensing based on the low

M^*

bound and use it to regularize dictionary learning problems. We detail numerical experiments on both compression and inpainting problems and show that this more principled regularization approach consistently improves reconstruction performance on new images

arXiv.org e-Print Archive

On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution

Author: Cheung Sky
Kuo Han-Wen
Lau Yenson
Pasupathy Abhay
Wright John
Zhang Yuqian
Publication venue
Publication date: 07/01/2019
Field of study

Blind deconvolution is the problem of recovering a convolutional kernel

\boldsymbol a_0

and an activation signal

\boldsymbol x_0

from their convolution

\boldsymbol y = \boldsymbol a_0 \circledast \boldsymbol x_0

. This problem is ill-posed without further constraints or priors. This paper studies the situation where the nonzero entries in the activation signal are sparsely and randomly populated. We normalize the convolution kernel to have unit Frobenius norm and cast the sparse blind deconvolution problem as a nonconvex optimization problem over the sphere. With this spherical constraint, every spurious local minimum turns out to be close to some signed shift truncation of the ground truth, under certain hypotheses. This benign property motivates an effective two stage algorithm that recovers the ground truth from the partial information offered by a suboptimal local minimum. This geometry-inspired algorithm recovers the ground truth for certain microscopy problems, also exhibits promising performance in the more challenging image deblurring problem. Our insights into the global geometry and the two stage algorithm extend to the convolutional dictionary learning problem, where a superposition of multiple convolution signals is observed

arXiv.org e-Print Archive

Finding a sparse vector in a subspace: Linear sparsity using alternating directions

Author: Qu Qing
Sun Ju
Wright John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2016
Field of study

Is it possible to find the sparsest vector (direction) in a generic subspace

\mathcal{S} \subseteq \mathbb{R}^p

with

\mathrm{dim}(\mathcal{S})= n < p

? This problem can be considered a homogeneous variant of the sparse recovery problem, and finds connections to sparse dictionary learning, sparse PCA, and many other problems in signal processing and machine learning. In this paper, we focus on a **planted sparse model** for the subspace: the target sparse vector is embedded in an otherwise random subspace. Simple convex heuristics for this planted recovery problem provably break down when the fraction of nonzero entries in the target sparse vector substantially exceeds

O(1/\sqrt{n})

. In contrast, we exhibit a relatively simple nonconvex approach based on alternating directions, which provably succeeds even when the fraction of nonzero entries is

\Omega(1)

. To the best of our knowledge, this is the first practical algorithm to achieve linear scaling under the planted sparse model. Empirically, our proposed algorithm also succeeds in more challenging data models, e.g., sparse dictionary learning.Comment: Accepted by IEEE Trans. Information Theory. The paper has been revised by the reviewers' comments. The proofs have been streamline

arXiv.org e-Print Archive

Learning overcomplete, low coherence dictionaries with linear inference

Author: Bujan Alejandro F.
Livezey Jesse A.
Sommer Friedrich T.
Publication venue
Publication date: 15/10/2018
Field of study

Finding overcomplete latent representations of data has applications in data analysis, signal processing, machine learning, theoretical neuroscience and many other fields. In an overcomplete representation, the number of latent features exceeds the data dimensionality, which is useful when the data is undersampled by the measurements (compressed sensing, information bottlenecks in neural systems) or composed from multiple complete sets of linear features, each spanning the data space. Independent Components Analysis (ICA) is a linear technique for learning sparse latent representations, which typically has a lower computational cost than sparse coding, its nonlinear, recurrent counterpart. While well suited for finding complete representations, we show that overcompleteness poses a challenge to existing ICA algorithms. Specifically, the coherence control in existing ICA algorithms, necessary to prevent the formation of duplicate dictionary features, is ill-suited in the overcomplete case. We show that in this case several existing ICA algorithms have undesirable global minima that maximize coherence. Further, by comparing ICA algorithms on synthetic data and natural images to the computationally more expensive sparse coding solution, we show that the coherence control biases the exploration of the data manifold, sometimes yielding suboptimal solutions. We provide a theoretical explanation of these failures and, based on the theory, propose improved overcomplete ICA algorithms. All told, this study contributes new insights into and methods for coherence control for linear ICA, some of which are applicable to many other, potentially nonlinear, unsupervised learning methods.Comment: 27 pages, 11 figure

arXiv.org e-Print Archive

Subspace-Sparse Representation

Author: Vidal R.
You C.
Publication venue
Publication date: 05/07/2015
Field of study

Given an overcomplete dictionary

A

and a signal

b

that is a linear combination of a few linearly independent columns of

A

, classical sparse recovery theory deals with the problem of recovering the unique sparse representation

x

such that

b = A x

. It is known that under certain conditions on

A

x

can be recovered by the Basis Pursuit (BP) and the Orthogonal Matching Pursuit (OMP) algorithms. In this work, we consider the more general case where

b

lies in a low-dimensional subspace spanned by some columns of

A

, which are possibly linearly dependent. In this case, the sparsest solution

x

is generally not unique, and we study the problem that the representation

x

identifies the subspace, i.e. the nonzero entries of

x

correspond to dictionary atoms that are in the subspace. Such a representation

x

is called subspace-sparse. We present sufficient conditions for guaranteeing subspace-sparse recovery, which have clear geometric interpretations and explain properties of subspace-sparse recovery. We also show that the sufficient conditions can be satisfied under a randomized model. Our results are applicable to the traditional sparse recovery problem and we get conditions for sparse recovery that are less restrictive than the canonical mutual coherent condition. We also use the results to analyze the sparse representation based classification (SRC) method, for which we get conditions to show its correctness.Comment: 15 pages, 3 figures, previous version published in ICML 201

arXiv.org e-Print Archive

Subgradient Descent Learns Orthogonal Dictionaries

Author: Bai Yu
Jiang Qijia
Sun Ju
Publication venue
Publication date: 01/07/2019
Field of study

This paper concerns dictionary learning, i.e., sparse coding, a fundamental representation learning problem. We show that a subgradient descent algorithm, with random initialization, can provably recover orthogonal dictionaries on a natural nonsmooth, nonconvex

\ell_1

minimization formulation of the problem, under mild statistical assumptions on the data. This is in contrast to previous provable methods that require either expensive computation or delicate initialization schemes. Our analysis develops several tools for characterizing landscapes of nonsmooth functions, which might be of independent interest for provable training of deep networks with nonsmooth activations (e.g., ReLU), among numerous other applications. Preliminary experiments corroborate our analysis and show that our algorithm works well empirically in recovering orthogonal dictionaries

arXiv.org e-Print Archive

Efficient Dictionary Learning with Gradient Descent

Author: Buchanan Sam
Gilboa Dar
Wright John
Publication venue
Publication date: 26/09/2018
Field of study

Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem -- complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well

arXiv.org e-Print Archive

Sparse Approximation, List Decoding, and Uncertainty Principles

Author: Gilbert Anna C.
Khamis Mahmoud Abo
Ngo Hung Q.
Rudra Atri
Publication venue
Publication date: 08/08/2014
Field of study

We consider list versions of sparse approximation problems, where unlike the existing results in sparse approximation that consider situations with unique solutions, we are interested in multiple solutions. We introduce these problems and present the first combinatorial results on the output list size. These generalize and enhance some of the existing results on threshold phenomenon and uncertainty principles in sparse approximations. Our definitions and results are inspired by similar results in list decoding. We also present lower bound examples that bolster our results and show they are of the appropriate size

arXiv.org e-Print Archive