7 research outputs found
Apprentissage de dictionnaire pour les représentations parcimonieuses
This is an abstract of the full preprint available at http://hal.inria.fr/hal-00918142/National audienceA popular approach within the signal processing and machine learning communities consists in modelling high-dimensional data as sparse linear combinations of atoms selected from a dictionary. Given the importance of the choice of the dictionary for the operational deployment of these tools, a growing interest for \emph{learned} dictionaries has emerged. The most popular dictionary learning techniques, which are expressed as large-scale matrix factorization through the optimization of a non convex cost function, have been widely disseminated thanks to extensive empirical evidence of their success and steady algorithmic progress. Yet, until recently they remained essentially heuristic. We will present recent work on statistical aspects of sparse dictionary learning, contributing to the characterization of the excess risk as a function of the number of training samples. The results cover non only sparse dictionary learning but also a much larger class of constrained matrix factorization problems.La modélisation de données de grande dimension comme combinaisons linéaires parcimonieuses d'atomes d'un dictionnaire est devenu un outil trÚs populaire en traitement du signal et de l'image. Etant donné l'importance du choix du dictionnaire pour le déploiement opérationnel de ces outils, des approches basées sur l'apprentissage à partir d'une collection ont connu un bel essor. Les techniques les plus populaires abordent le problÚme sous l'angle de la factorisation de grandes matrices via la minimisation d'une fonction de coût non-convexe. Si des progrÚs importants en terme d'efficacité algorithmique ont favorisé leur diffusion, ces approches restaient jusqu'à récemment essentiellement empiriques. Nous présenterons des travaux récents abordant les aspects statistiques de ces techniques et contribuant à caractériser l'excÚs de risque en fonction du nombre d'exemples disponibles. Les résultats couvrent non seulement l'apprentissage de dictionnaire pour les représentations parcimonieuses, mais également une classe sensiblement plus large de factorisations de matrices sous contraintes
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as sparse
dictionary learning, principal component analysis (PCA), non-negative matrix
factorization (NMF), -means clustering, etc., rely on the factorization of a
matrix obtained by concatenating high-dimensional vectors from a training
collection. While the idealized task would be to optimize the expected quality
of the factors over the underlying distribution of training vectors, it is
achieved in practice by minimizing an empirical average over the considered
collection. The focus of this paper is to provide sample complexity estimates
to uniformly control how much the empirical average deviates from the expected
cost function. Standard arguments imply that the performance of the empirical
predictor also exhibit such guarantees. The level of genericity of the approach
encompasses several possible constraints on the factors (tensor product
structure, shift-invariance, sparsity \ldots), thus providing a unified
perspective on the sample complexity of several widely used matrix
factorization schemes. The derived generalization bounds behave proportional to
w.r.t.\ the number of samples for the considered matrix
factorization techniques.Comment: to appea
Adaptive sparse coding and dictionary selection
Grant no. D000246/1.The sparse coding is approximation/representation of signals with the minimum number of
coefficients using an overcomplete set of elementary functions. This kind of approximations/
representations has found numerous applications in source separation, denoising, coding and
compressed sensing. The adaptation of the sparse approximation framework to the coding
problem of signals is investigated in this thesis. Open problems are the selection of appropriate
models and their orders, coefficient quantization and sparse approximation method. Some of
these questions are addressed in this thesis and novel methods developed. Because almost all
recent communication and storage systems are digital, an easy method to compute quantized
sparse approximations is introduced in the first part.
The model selection problem is investigated next. The linear model can be adapted to better
fit a given signal class. It can also be designed based on some a priori information about the
model. Two novel dictionary selection methods are separately presented in the second part
of the thesis. The proposed model adaption algorithm, called Dictionary Learning with the
Majorization Method (DLMM), is much more general than current methods. This generality
allowes it to be used with different constraints on the model. Particularly, two important cases
have been considered in this thesis for the first time, Parsimonious Dictionary Learning (PDL)
and Compressible Dictionary Learning (CDL). When the generative model order is not given,
PDL not only adapts the dictionary to the given class of signals, but also reduces the model
order redundancies. When a fast dictionary is needed, the CDL framework helps us to find a
dictionary which is adapted to the given signal class without increasing the computation cost
so much.
Sometimes a priori information about the linear generative model is given in format of a parametric
function. Parametric Dictionary Design (PDD) generates a suitable dictionary for sparse
coding using the parametric function. Basically PDD finds a parametric dictionary with a minimal
dictionary coherence, which has been shown to be suitable for sparse approximation and
exact sparse recovery.
Theoretical analyzes are accompanied by experiments to validate the analyzes. This research
was primarily used for audio applications, as audio can be shown to have sparse structures.
Therefore, most of the experiments are done using audio signals
Learning overcomplete dictionaries with â0-sparse Non-negative Matrix Factorisation
Non-negative Matrix Factorisation (NMF) is a popular
tool in which a âparts-basedâ representation of a non-negative
matrix is sought. NMF tends to produce sparse decompositions.
This sparsity is a desirable property in many applications, and
Sparse NMF (S-NMF) methods have been proposed to enhance
this feature. Typically these enforce sparsity through use of
a penalty term, and a `1 norm penalty term is often used.
However an `1 penalty term may not be appropriate in a
non-negative framework. In this paper the use of a `0 norm
penalty for NMF is proposed, approximated using backwards
elimination from an initial NNLS decomposition. Dictionary
recovery experiments using overcomplete dictionaries show that
this method outperforms both NMF and a state of the art S-NMF
method, in particular when the dictionary to be learnt is dense
New perspectives in statistical mechanics and high-dimensional inference
The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the âspikeâ, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it