133,438 research outputs found
The Sample Complexity of Dictionary Learning
A large set of signals can sometimes be described sparsely using a
dictionary, that is, every element can be represented as a linear combination
of few elements from the dictionary. Algorithms for various signal processing
applications, including classification, denoising and signal separation, learn
a dictionary from a set of signals to be represented. Can we expect that the
representation found by such a dictionary for a previously unseen example from
the same source will have L_2 error of the same magnitude as those for the
given examples? We assume signals are generated from a fixed distribution, and
study this questions from a statistical learning theory perspective.
We develop generalization bounds on the quality of the learned dictionary for
two types of constraints on the coefficient selection, as measured by the
expected L_2 error in representation when the dictionary is used. For the case
of l_1 regularized coefficient selection we provide a generalization bound of
the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the
number of elements in the dictionary, lambda is a bound on the l_1 norm of the
coefficient vector and m is the number of samples, which complements existing
results. For the case of representing a new signal as a combination of at most
k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m))
under an assumption on the level of orthogonality of the dictionary (low Babel
function). We further show that this assumption holds for most dictionaries in
high dimensions in a strong probabilistic sense. Our results further yield fast
rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher
complexity. We provide similar results in a general setting using kernels with
weak smoothness requirements
On The Sample Complexity of Sparse Dictionary Learning
In the synthesis model signals are represented as a sparse combinations of
atoms from a dictionary. Dictionary learning describes the acquisition process
of the underlying dictionary for a given set of training samples. While ideally
this would be achieved by optimizing the expectation of the factors over the
underlying distribution of the training data, in practice the necessary
information about the distribution is not available. Therefore, in real world
applications it is achieved by minimizing an empirical average over the
available samples. The main goal of this paper is to provide a sample
complexity estimate that controls to what extent the empirical average deviates
from the cost function. This estimate then provides a suitable estimate to the
accuracy of the representation of the learned dictionary. The presented
approach exemplifies the general results proposed by the authors in Sample
Complexity of Dictionary Learning and other Matrix Factorizations, Gribonval et
al. and gives more concrete bounds of the sample complexity of dictionary
learning. We cover a variety of sparsity measures employed in the learning
procedure.Comment: 4 pages, submitted to Statistical Signal Processing Workshop 201
Sample Complexity of Bayesian Optimal Dictionary Learning
We consider a learning problem of identifying a dictionary matrix D (M times
N dimension) from a sample set of M dimensional vectors Y = N^{-1/2} DX, where
X is a sparse matrix (N times P dimension) in which the density of non-zero
entries is 0<rho< 1. In particular, we focus on the minimum sample size P_c
(sample complexity) necessary for perfectly identifying D of the optimal
learning scheme when D and X are independently generated from certain
distributions. By using the replica method of statistical mechanics, we show
that P_c=O(N) holds as long as alpha = M/N >rho is satisfied in the limit of N
to infinity. Our analysis also implies that the posterior distribution given Y
is condensed only at the correct dictionary D when the compression rate alpha
is greater than a certain critical value alpha_M(rho). This suggests that
belief propagation may allow us to learn D with a low computational complexity
using O(N) samples.Comment: 5pages, 5figure
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as sparse
dictionary learning, principal component analysis (PCA), non-negative matrix
factorization (NMF), -means clustering, etc., rely on the factorization of a
matrix obtained by concatenating high-dimensional vectors from a training
collection. While the idealized task would be to optimize the expected quality
of the factors over the underlying distribution of training vectors, it is
achieved in practice by minimizing an empirical average over the considered
collection. The focus of this paper is to provide sample complexity estimates
to uniformly control how much the empirical average deviates from the expected
cost function. Standard arguments imply that the performance of the empirical
predictor also exhibit such guarantees. The level of genericity of the approach
encompasses several possible constraints on the factors (tensor product
structure, shift-invariance, sparsity \ldots), thus providing a unified
perspective on the sample complexity of several widely used matrix
factorization schemes. The derived generalization bounds behave proportional to
w.r.t.\ the number of samples for the considered matrix
factorization techniques.Comment: to appea
- âŠ