Search CORE

130,093 research outputs found

The Sample Complexity of Dictionary Learning

Author: Aharon
Amaldi
Baraniuk
Bruckstein
Burges
Campadelli
Campbell
Campbell
Candes
Candès
Chapelle
Dehak
Donoho
Eliathamby Ambikairajah
Fauve
Figueiredo
Friedman
Georghiades
Huang
Ji
Jia Min Karen Kua
Julien Epps
Kinnunen
Kreutz-Delgado
Mairal
McLaren
Reynolds
Reynolds
Tao
Tibshirani
Tikhonov
Webb
Wright
Zou
Publication venue: 'Elsevier BV'
Publication date: 24/11/2010
Field of study

A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a set of signals to be represented. Can we expect that the representation found by such a dictionary for a previously unseen example from the same source will have L_2 error of the same magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study this questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L_2 error in representation when the dictionary is used. For the case of l_1 regularized coefficient selection we provide a generalization bound of the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the number of elements in the dictionary, lambda is a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m)) under an assumption on the level of orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results further yield fast rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher complexity. We provide similar results in a general setting using kernels with weak smoothness requirements

arXiv.org e-Print Archive

CiteSeerX

Crossref

On The Sample Complexity of Sparse Dictionary Learning

Author: Bach Francis
Gribonval Rémi
Jenatton Rodolphe
Kleinsteuber Martin
Seibert Matthias
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/03/2014
Field of study

In the synthesis model signals are represented as a sparse combinations of atoms from a dictionary. Dictionary learning describes the acquisition process of the underlying dictionary for a given set of training samples. While ideally this would be achieved by optimizing the expectation of the factors over the underlying distribution of the training data, in practice the necessary information about the distribution is not available. Therefore, in real world applications it is achieved by minimizing an empirical average over the available samples. The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function. This estimate then provides a suitable estimate to the accuracy of the representation of the learned dictionary. The presented approach exemplifies the general results proposed by the authors in Sample Complexity of Dictionary Learning and other Matrix Factorizations, Gribonval et al. and gives more concrete bounds of the sample complexity of dictionary learning. We cover a variety of sparsity measures employed in the learning procedure.Comment: 4 pages, submitted to Statistical Signal Processing Workshop 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Sample Complexity of Bayesian Optimal Dictionary Learning

Author: Kabashima Yoshiyuki
Sakata Ayaka
Publication venue
Publication date: 01/01/2013
Field of study

We consider a learning problem of identifying a dictionary matrix D (M times N dimension) from a sample set of M dimensional vectors Y = N^{-1/2} DX, where X is a sparse matrix (N times P dimension) in which the density of non-zero entries is 0<rho< 1. In particular, we focus on the minimum sample size P_c (sample complexity) necessary for perfectly identifying D of the optimal learning scheme when D and X are independently generated from certain distributions. By using the replica method of statistical mechanics, we show that P_c=O(N) holds as long as alpha = M/N >rho is satisfied in the limit of N to infinity. Our analysis also implies that the posterior distribution given Y is condensed only at the correct dictionary D when the compression rate alpha is greater than a certain critical value alpha_M(rho). This suggests that belief propagation may allow us to learn D with a low computational complexity using O(N) samples.Comment: 5pages, 5figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Sample Complexity of Dictionary Learning and other Matrix Factorizations

Author: Bach Francis
Gribonval Rémi
Jenatton Rodolphe
Kleinsteuber Martin
Seibert Matthias
Publication venue
Publication date: 01/01/2015
Field of study

Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF),

K

-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to

\sqrt{\log(n)/n}

w.r.t.\ the number of samples

n

for the considered matrix factorization techniques.Comment: to appea

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1