49,212 research outputs found

    Complete Dictionary Recovery over the Sphere

    Full text link
    We consider the problem of recovering a complete (i.e., square and invertible) matrix A0\mathbf A_0, from Y∈RnΓ—p\mathbf Y \in \mathbb R^{n \times p} with Y=A0X0\mathbf Y = \mathbf A_0 \mathbf X_0, provided X0\mathbf X_0 is sufficiently sparse. This recovery problem is central to the theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals, and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers A0\mathbf A_0 when X0\mathbf X_0 has O(n)O(n) nonzeros per column, under suitable probability model for X0\mathbf X_0. In contrast, prior results based on efficient algorithms provide recovery guarantees when X0\mathbf X_0 has only O(n1βˆ’Ξ΄)O(n^{1-\delta}) nonzeros per column for any constant δ∈(0,1)\delta \in (0, 1). Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. To show this apparently hard problem is tractable, we first provide a geometric characterization of the high-dimensional objective landscape, which shows that with high probability there are no "spurious" local minima. This particular geometric structure allows us to design a Riemannian trust region algorithm over the sphere that provably converges to one local minimizer with an arbitrary initialization, despite the presence of saddle points. The geometric approach we develop here may also shed light on other problems arising from nonconvex recovery of structured signals.Comment: 104 pages, 5 figures. Due to length constraint of publication, this long paper are subsequently divided into two papers (arXiv:1511.03607 and arXiv:1511.04777). Further updates will be made only to the two paper

    Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region Method

    Full text link
    We consider the problem of recovering a complete (i.e., square and invertible) matrix A0\mathbf A_0, from Y∈RnΓ—p\mathbf Y \in \mathbb{R}^{n \times p} with Y=A0X0\mathbf Y = \mathbf A_0 \mathbf X_0, provided X0\mathbf X_0 is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers A0\mathbf A_0 when X0\mathbf X_0 has O(n)O(n) nonzeros per column, under suitable probability model for X0\mathbf X_0. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. In a companion paper (arXiv:1511.03607), we have showed that with high probability our nonconvex formulation has no "spurious" local minimizers and around any saddle point the objective function has a negative directional curvature. In this paper, we take advantage of the particular geometric structure, and describe a Riemannian trust region algorithm that provably converges to a local minimizer with from arbitrary initializations. Such minimizers give excellent approximations to rows of X0\mathbf X_0. The rows are then recovered by linear programming rounding and deflation.Comment: The second of two papers based on the report arXiv:1504.06785. Accepted by IEEE Transaction on Information Theory; revised according to the reviewers' comment

    Mβˆ—\mathbf{M^*}-Regularized Dictionary Learning

    Full text link
    Classical dictionary learning methods simply normalize dictionary columns at each iteration, and the impact of this basic form of regularization on generalization performance (e.g. compression ratio on new images) is unclear. Here, we derive a tractable performance measure for dictionaries in compressed sensing based on the low Mβˆ—M^* bound and use it to regularize dictionary learning problems. We detail numerical experiments on both compression and inpainting problems and show that this more principled regularization approach consistently improves reconstruction performance on new images

    On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution

    Full text link
    Blind deconvolution is the problem of recovering a convolutional kernel a0\boldsymbol a_0 and an activation signal x0\boldsymbol x_0 from their convolution y=a0βŠ›x0\boldsymbol y = \boldsymbol a_0 \circledast \boldsymbol x_0. This problem is ill-posed without further constraints or priors. This paper studies the situation where the nonzero entries in the activation signal are sparsely and randomly populated. We normalize the convolution kernel to have unit Frobenius norm and cast the sparse blind deconvolution problem as a nonconvex optimization problem over the sphere. With this spherical constraint, every spurious local minimum turns out to be close to some signed shift truncation of the ground truth, under certain hypotheses. This benign property motivates an effective two stage algorithm that recovers the ground truth from the partial information offered by a suboptimal local minimum. This geometry-inspired algorithm recovers the ground truth for certain microscopy problems, also exhibits promising performance in the more challenging image deblurring problem. Our insights into the global geometry and the two stage algorithm extend to the convolutional dictionary learning problem, where a superposition of multiple convolution signals is observed

    Finding a sparse vector in a subspace: Linear sparsity using alternating directions

    Full text link
    Is it possible to find the sparsest vector (direction) in a generic subspace SβŠ†Rp\mathcal{S} \subseteq \mathbb{R}^p with dim(S)=n<p\mathrm{dim}(\mathcal{S})= n < p? This problem can be considered a homogeneous variant of the sparse recovery problem, and finds connections to sparse dictionary learning, sparse PCA, and many other problems in signal processing and machine learning. In this paper, we focus on a **planted sparse model** for the subspace: the target sparse vector is embedded in an otherwise random subspace. Simple convex heuristics for this planted recovery problem provably break down when the fraction of nonzero entries in the target sparse vector substantially exceeds O(1/n)O(1/\sqrt{n}). In contrast, we exhibit a relatively simple nonconvex approach based on alternating directions, which provably succeeds even when the fraction of nonzero entries is Ξ©(1)\Omega(1). To the best of our knowledge, this is the first practical algorithm to achieve linear scaling under the planted sparse model. Empirically, our proposed algorithm also succeeds in more challenging data models, e.g., sparse dictionary learning.Comment: Accepted by IEEE Trans. Information Theory. The paper has been revised by the reviewers' comments. The proofs have been streamline

    Learning overcomplete, low coherence dictionaries with linear inference

    Full text link
    Finding overcomplete latent representations of data has applications in data analysis, signal processing, machine learning, theoretical neuroscience and many other fields. In an overcomplete representation, the number of latent features exceeds the data dimensionality, which is useful when the data is undersampled by the measurements (compressed sensing, information bottlenecks in neural systems) or composed from multiple complete sets of linear features, each spanning the data space. Independent Components Analysis (ICA) is a linear technique for learning sparse latent representations, which typically has a lower computational cost than sparse coding, its nonlinear, recurrent counterpart. While well suited for finding complete representations, we show that overcompleteness poses a challenge to existing ICA algorithms. Specifically, the coherence control in existing ICA algorithms, necessary to prevent the formation of duplicate dictionary features, is ill-suited in the overcomplete case. We show that in this case several existing ICA algorithms have undesirable global minima that maximize coherence. Further, by comparing ICA algorithms on synthetic data and natural images to the computationally more expensive sparse coding solution, we show that the coherence control biases the exploration of the data manifold, sometimes yielding suboptimal solutions. We provide a theoretical explanation of these failures and, based on the theory, propose improved overcomplete ICA algorithms. All told, this study contributes new insights into and methods for coherence control for linear ICA, some of which are applicable to many other, potentially nonlinear, unsupervised learning methods.Comment: 27 pages, 11 figure

    Subspace-Sparse Representation

    Full text link
    Given an overcomplete dictionary AA and a signal bb that is a linear combination of a few linearly independent columns of AA, classical sparse recovery theory deals with the problem of recovering the unique sparse representation xx such that b=Axb = A x. It is known that under certain conditions on AA, xx can be recovered by the Basis Pursuit (BP) and the Orthogonal Matching Pursuit (OMP) algorithms. In this work, we consider the more general case where bb lies in a low-dimensional subspace spanned by some columns of AA, which are possibly linearly dependent. In this case, the sparsest solution xx is generally not unique, and we study the problem that the representation xx identifies the subspace, i.e. the nonzero entries of xx correspond to dictionary atoms that are in the subspace. Such a representation xx is called subspace-sparse. We present sufficient conditions for guaranteeing subspace-sparse recovery, which have clear geometric interpretations and explain properties of subspace-sparse recovery. We also show that the sufficient conditions can be satisfied under a randomized model. Our results are applicable to the traditional sparse recovery problem and we get conditions for sparse recovery that are less restrictive than the canonical mutual coherent condition. We also use the results to analyze the sparse representation based classification (SRC) method, for which we get conditions to show its correctness.Comment: 15 pages, 3 figures, previous version published in ICML 201

    Subgradient Descent Learns Orthogonal Dictionaries

    Full text link
    This paper concerns dictionary learning, i.e., sparse coding, a fundamental representation learning problem. We show that a subgradient descent algorithm, with random initialization, can provably recover orthogonal dictionaries on a natural nonsmooth, nonconvex β„“1\ell_1 minimization formulation of the problem, under mild statistical assumptions on the data. This is in contrast to previous provable methods that require either expensive computation or delicate initialization schemes. Our analysis develops several tools for characterizing landscapes of nonsmooth functions, which might be of independent interest for provable training of deep networks with nonsmooth activations (e.g., ReLU), among numerous other applications. Preliminary experiments corroborate our analysis and show that our algorithm works well empirically in recovering orthogonal dictionaries

    Efficient Dictionary Learning with Gradient Descent

    Full text link
    Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem -- complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well

    Sparse Approximation, List Decoding, and Uncertainty Principles

    Full text link
    We consider list versions of sparse approximation problems, where unlike the existing results in sparse approximation that consider situations with unique solutions, we are interested in multiple solutions. We introduce these problems and present the first combinatorial results on the output list size. These generalize and enhance some of the existing results on threshold phenomenon and uncertainty principles in sparse approximations. Our definitions and results are inspired by similar results in list decoding. We also present lower bound examples that bolster our results and show they are of the appropriate size
    • …
    corecore