96 research outputs found
Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups. To this end,
we present two different strategies: On the one hand, we show that the proximal
operator associated with a sum of l_infinity-norms can be computed exactly in
polynomial time by solving a quadratic min-cost flow problem, allowing the use
of accelerated proximal gradient methods. On the other hand, we use proximal
splitting techniques, and address an equivalent formulation with
non-overlapping groups, but in higher dimension and with additional
constraints. We propose efficient and scalable algorithms exploiting these two
strategies, which are significantly faster than alternative approaches. We
illustrate these methods with several problems such as CUR matrix
factorization, multi-task learning of tree-structured dictionaries, background
subtraction in video sequences, image denoising with wavelets, and topographic
dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR
Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization
We consider the problem of sparse coding, where each sample consists of a
sparse linear combination of a set of dictionary atoms, and the task is to
learn both the dictionary elements and the mixing coefficients. Alternating
minimization is a popular heuristic for sparse coding, where the dictionary and
the coefficients are estimated in alternate steps, keeping the other fixed.
Typically, the coefficients are estimated via minimization, keeping
the dictionary fixed, and the dictionary is estimated through least squares,
keeping the coefficients fixed. In this paper, we establish local linear
convergence for this variant of alternating minimization and establish that the
basin of attraction for the global optimum (corresponding to the true
dictionary and the coefficients) is \order{1/s^2}, where is the sparsity
level in each sample and the dictionary satisfies RIP. Combined with the recent
results of approximate dictionary estimation, this yields provable guarantees
for exact recovery of both the dictionary elements and the coefficients, when
the dictionary elements are incoherent.Comment: Local linear convergence now holds under RIP and also more general
restricted eigenvalue condition
Sparse and spurious: dictionary learning with noise and outliers
A popular approach within the signal processing and machine learning
communities consists in modelling signals as sparse linear combinations of
atoms selected from a learned dictionary. While this paradigm has led to
numerous empirical successes in various fields ranging from image to audio
processing, there have only been a few theoretical arguments supporting these
evidences. In particular, sparse coding, or sparse dictionary learning, relies
on a non-convex procedure whose local minima have not been fully analyzed yet.
In this paper, we consider a probabilistic model of sparse signals, and show
that, with high probability, sparse coding admits a local minimum around the
reference dictionary generating the signals. Our study takes into account the
case of over-complete dictionaries, noisy signals, and possible outliers, thus
extending previous work limited to noiseless settings and/or under-complete
dictionaries. The analysis we conduct is non-asymptotic and makes it possible
to understand how the key quantities of the problem, such as the coherence or
the level of noise, can scale with respect to the dimension of the signals, the
number of atoms, the sparsity and the number of observations.Comment: This is a substantially revised version of a first draft that
appeared as a preprint titled "Local stability and robustness of sparse
dictionary learning in the presence of noise",
http://hal.inria.fr/hal-00737152, IEEE Transactions on Information Theory,
Institute of Electrical and Electronics Engineers (IEEE), 2015, pp.2
Local stability and robustness of sparse dictionary learning in the presence of noise
A popular approach within the signal processing and machine learning
communities consists in modelling signals as sparse linear combinations of
atoms selected from a learned dictionary. While this paradigm has led to
numerous empirical successes in various fields ranging from image to audio
processing, there have only been a few theoretical arguments supporting these
evidences. In particular, sparse coding, or sparse dictionary learning, relies
on a non-convex procedure whose local minima have not been fully analyzed yet.
In this paper, we consider a probabilistic model of sparse signals, and show
that, with high probability, sparse coding admits a local minimum around the
reference dictionary generating the signals. Our study takes into account the
case of over-complete dictionaries and noisy signals, thus extending previous
work limited to noiseless settings and/or under-complete dictionaries. The
analysis we conduct is non-asymptotic and makes it possible to understand how
the key quantities of the problem, such as the coherence or the level of noise,
can scale with respect to the dimension of the signals, the number of atoms,
the sparsity and the number of observations
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as sparse
dictionary learning, principal component analysis (PCA), non-negative matrix
factorization (NMF), -means clustering, etc., rely on the factorization of a
matrix obtained by concatenating high-dimensional vectors from a training
collection. While the idealized task would be to optimize the expected quality
of the factors over the underlying distribution of training vectors, it is
achieved in practice by minimizing an empirical average over the considered
collection. The focus of this paper is to provide sample complexity estimates
to uniformly control how much the empirical average deviates from the expected
cost function. Standard arguments imply that the performance of the empirical
predictor also exhibit such guarantees. The level of genericity of the approach
encompasses several possible constraints on the factors (tensor product
structure, shift-invariance, sparsity \ldots), thus providing a unified
perspective on the sample complexity of several widely used matrix
factorization schemes. The derived generalization bounds behave proportional to
w.r.t.\ the number of samples for the considered matrix
factorization techniques.Comment: to appea
Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling
We consider the problem of learning a low-dimensional signal model from a
collection of training samples. The mainstream approach would be to learn an
overcomplete dictionary to provide good approximations of the training samples
using sparse synthesis coefficients. This famous sparse model has a less well
known counterpart, in analysis form, called the cosparse analysis model. In
this new model, signals are characterised by their parsimony in a transformed
domain using an overcomplete (linear) analysis operator. We propose to learn an
analysis operator from a training corpus using a constrained optimisation
framework based on L1 optimisation. The reason for introducing a constraint in
the optimisation framework is to exclude trivial solutions. Although there is
no final answer here for which constraint is the most relevant constraint, we
investigate some conventional constraints in the model adaptation field and use
the uniformly normalised tight frame (UNTF) for this purpose. We then derive a
practical learning algorithm, based on projected subgradients and
Douglas-Rachford splitting technique, and demonstrate its ability to robustly
recover a ground truth analysis operator, when provided with a clean training
set, of sufficient size. We also find an analysis operator for images, using
some noisy cosparse signals, which is indeed a more realistic experiment. As
the derived optimisation problem is not a convex program, we often find a local
minimum using such variational methods. Some local optimality conditions are
derived for two different settings, providing preliminary theoretical support
for the well-posedness of the learning problem under appropriate conditions.Comment: 29 pages, 13 figures, accepted to be published in TS
- âŠ