49,212 research outputs found
Complete Dictionary Recovery over the Sphere
We consider the problem of recovering a complete (i.e., square and
invertible) matrix , from
with , provided is
sufficiently sparse. This recovery problem is central to the theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals, and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers when has nonzeros per
column, under suitable probability model for . In contrast, prior
results based on efficient algorithms provide recovery guarantees when has only nonzeros per column for any constant .
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint, and hence is naturally
phrased in the language of manifold optimization. To show this apparently hard
problem is tractable, we first provide a geometric characterization of the
high-dimensional objective landscape, which shows that with high probability
there are no "spurious" local minima. This particular geometric structure
allows us to design a Riemannian trust region algorithm over the sphere that
provably converges to one local minimizer with an arbitrary initialization,
despite the presence of saddle points. The geometric approach we develop here
may also shed light on other problems arising from nonconvex recovery of
structured signals.Comment: 104 pages, 5 figures. Due to length constraint of publication, this
long paper are subsequently divided into two papers (arXiv:1511.03607 and
arXiv:1511.04777). Further updates will be made only to the two paper
Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region Method
We consider the problem of recovering a complete (i.e., square and
invertible) matrix , from
with , provided is
sufficiently sparse. This recovery problem is central to theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers when has nonzeros per
column, under suitable probability model for .
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint, and hence is naturally
phrased in the language of manifold optimization. In a companion paper
(arXiv:1511.03607), we have showed that with high probability our nonconvex
formulation has no "spurious" local minimizers and around any saddle point the
objective function has a negative directional curvature. In this paper, we take
advantage of the particular geometric structure, and describe a Riemannian
trust region algorithm that provably converges to a local minimizer with from
arbitrary initializations. Such minimizers give excellent approximations to
rows of . The rows are then recovered by linear programming
rounding and deflation.Comment: The second of two papers based on the report arXiv:1504.06785.
Accepted by IEEE Transaction on Information Theory; revised according to the
reviewers' comment
-Regularized Dictionary Learning
Classical dictionary learning methods simply normalize dictionary columns at
each iteration, and the impact of this basic form of regularization on
generalization performance (e.g. compression ratio on new images) is unclear.
Here, we derive a tractable performance measure for dictionaries in compressed
sensing based on the low bound and use it to regularize dictionary
learning problems. We detail numerical experiments on both compression and
inpainting problems and show that this more principled regularization approach
consistently improves reconstruction performance on new images
On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution
Blind deconvolution is the problem of recovering a convolutional kernel
and an activation signal from their
convolution . This
problem is ill-posed without further constraints or priors. This paper studies
the situation where the nonzero entries in the activation signal are sparsely
and randomly populated. We normalize the convolution kernel to have unit
Frobenius norm and cast the sparse blind deconvolution problem as a nonconvex
optimization problem over the sphere. With this spherical constraint, every
spurious local minimum turns out to be close to some signed shift truncation of
the ground truth, under certain hypotheses. This benign property motivates an
effective two stage algorithm that recovers the ground truth from the partial
information offered by a suboptimal local minimum. This geometry-inspired
algorithm recovers the ground truth for certain microscopy problems, also
exhibits promising performance in the more challenging image deblurring
problem. Our insights into the global geometry and the two stage algorithm
extend to the convolutional dictionary learning problem, where a superposition
of multiple convolution signals is observed
Finding a sparse vector in a subspace: Linear sparsity using alternating directions
Is it possible to find the sparsest vector (direction) in a generic subspace
with ?
This problem can be considered a homogeneous variant of the sparse recovery
problem, and finds connections to sparse dictionary learning, sparse PCA, and
many other problems in signal processing and machine learning. In this paper,
we focus on a **planted sparse model** for the subspace: the target sparse
vector is embedded in an otherwise random subspace. Simple convex heuristics
for this planted recovery problem provably break down when the fraction of
nonzero entries in the target sparse vector substantially exceeds
. In contrast, we exhibit a relatively simple nonconvex approach
based on alternating directions, which provably succeeds even when the fraction
of nonzero entries is . To the best of our knowledge, this is the
first practical algorithm to achieve linear scaling under the planted sparse
model. Empirically, our proposed algorithm also succeeds in more challenging
data models, e.g., sparse dictionary learning.Comment: Accepted by IEEE Trans. Information Theory. The paper has been
revised by the reviewers' comments. The proofs have been streamline
Learning overcomplete, low coherence dictionaries with linear inference
Finding overcomplete latent representations of data has applications in data
analysis, signal processing, machine learning, theoretical neuroscience and
many other fields. In an overcomplete representation, the number of latent
features exceeds the data dimensionality, which is useful when the data is
undersampled by the measurements (compressed sensing, information bottlenecks
in neural systems) or composed from multiple complete sets of linear features,
each spanning the data space. Independent Components Analysis (ICA) is a linear
technique for learning sparse latent representations, which typically has a
lower computational cost than sparse coding, its nonlinear, recurrent
counterpart. While well suited for finding complete representations, we show
that overcompleteness poses a challenge to existing ICA algorithms.
Specifically, the coherence control in existing ICA algorithms, necessary to
prevent the formation of duplicate dictionary features, is ill-suited in the
overcomplete case. We show that in this case several existing ICA algorithms
have undesirable global minima that maximize coherence. Further, by comparing
ICA algorithms on synthetic data and natural images to the computationally more
expensive sparse coding solution, we show that the coherence control biases the
exploration of the data manifold, sometimes yielding suboptimal solutions. We
provide a theoretical explanation of these failures and, based on the theory,
propose improved overcomplete ICA algorithms. All told, this study contributes
new insights into and methods for coherence control for linear ICA, some of
which are applicable to many other, potentially nonlinear, unsupervised
learning methods.Comment: 27 pages, 11 figure
Subspace-Sparse Representation
Given an overcomplete dictionary and a signal that is a linear
combination of a few linearly independent columns of , classical sparse
recovery theory deals with the problem of recovering the unique sparse
representation such that . It is known that under certain
conditions on , can be recovered by the Basis Pursuit (BP) and the
Orthogonal Matching Pursuit (OMP) algorithms. In this work, we consider the
more general case where lies in a low-dimensional subspace spanned by some
columns of , which are possibly linearly dependent. In this case, the
sparsest solution is generally not unique, and we study the problem that
the representation identifies the subspace, i.e. the nonzero entries of
correspond to dictionary atoms that are in the subspace. Such a representation
is called subspace-sparse. We present sufficient conditions for
guaranteeing subspace-sparse recovery, which have clear geometric
interpretations and explain properties of subspace-sparse recovery. We also
show that the sufficient conditions can be satisfied under a randomized model.
Our results are applicable to the traditional sparse recovery problem and we
get conditions for sparse recovery that are less restrictive than the canonical
mutual coherent condition. We also use the results to analyze the sparse
representation based classification (SRC) method, for which we get conditions
to show its correctness.Comment: 15 pages, 3 figures, previous version published in ICML 201
Subgradient Descent Learns Orthogonal Dictionaries
This paper concerns dictionary learning, i.e., sparse coding, a fundamental
representation learning problem. We show that a subgradient descent algorithm,
with random initialization, can provably recover orthogonal dictionaries on a
natural nonsmooth, nonconvex minimization formulation of the problem,
under mild statistical assumptions on the data. This is in contrast to previous
provable methods that require either expensive computation or delicate
initialization schemes. Our analysis develops several tools for characterizing
landscapes of nonsmooth functions, which might be of independent interest for
provable training of deep networks with nonsmooth activations (e.g., ReLU),
among numerous other applications. Preliminary experiments corroborate our
analysis and show that our algorithm works well empirically in recovering
orthogonal dictionaries
Efficient Dictionary Learning with Gradient Descent
Randomly initialized first-order optimization algorithms are the method of
choice for solving many high-dimensional nonconvex problems in machine
learning, yet general theoretical guarantees cannot rule out convergence to
critical points of poor objective value. For some highly structured nonconvex
problems however, the success of gradient descent can be understood by studying
the geometry of the objective. We study one such problem -- complete orthogonal
dictionary learning, and provide converge guarantees for randomly initialized
gradient descent to the neighborhood of a global optimum. The resulting rates
scale as low order polynomials in the dimension even though the objective
possesses an exponential number of saddle points. This efficient convergence
can be viewed as a consequence of negative curvature normal to the stable
manifolds associated with saddle points, and we provide evidence that this
feature is shared by other nonconvex problems of importance as well
Sparse Approximation, List Decoding, and Uncertainty Principles
We consider list versions of sparse approximation problems, where unlike the
existing results in sparse approximation that consider situations with unique
solutions, we are interested in multiple solutions. We introduce these problems
and present the first combinatorial results on the output list size. These
generalize and enhance some of the existing results on threshold phenomenon and
uncertainty principles in sparse approximations. Our definitions and results
are inspired by similar results in list decoding. We also present lower bound
examples that bolster our results and show they are of the appropriate size
- β¦