29 research outputs found
Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups. To this end,
we present two different strategies: On the one hand, we show that the proximal
operator associated with a sum of l_infinity-norms can be computed exactly in
polynomial time by solving a quadratic min-cost flow problem, allowing the use
of accelerated proximal gradient methods. On the other hand, we use proximal
splitting techniques, and address an equivalent formulation with
non-overlapping groups, but in higher dimension and with additional
constraints. We propose efficient and scalable algorithms exploiting these two
strategies, which are significantly faster than alternative approaches. We
illustrate these methods with several problems such as CUR matrix
factorization, multi-task learning of tree-structured dictionaries, background
subtraction in video sequences, image denoising with wavelets, and topographic
dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR
Collaborative Filtering via Group-Structured Dictionary Learning
Structured sparse coding and the related structured dictionary learning
problems are novel research areas in machine learning. In this paper we present
a new application of structured dictionary learning for collaborative filtering
based recommender systems. Our extensive numerical experiments demonstrate that
the presented technique outperforms its state-of-the-art competitors and has
several advantages over approaches that do not put structured constraints on
the dictionary elements.Comment: A compressed version of the paper has been accepted for publication
at the 10th International Conference on Latent Variable Analysis and Source
Separation (LVA/ICA 2012
Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization
We consider the problem of optimizing the sum of a smooth convex function and
a non-smooth convex function using proximal-gradient methods, where an error is
present in the calculation of the gradient of the smooth term or in the
proximity operator with respect to the non-smooth term. We show that both the
basic proximal-gradient method and the accelerated proximal-gradient method
achieve the same convergence rate as in the error-free case, provided that the
errors decrease at appropriate rates.Using these rates, we perform as well as
or better than a carefully chosen fixed error level on a set of structured
sparsity problems.Comment: Neural Information Processing Systems (2011
Learning Hierarchical and Topographic Dictionaries with Structured Sparsity
Recent work in signal processing and statistics have focused on defining new
regularization functions, which not only induce sparsity of the solution, but
also take into account the structure of the problem. We present in this paper a
class of convex penalties introduced in the machine learning community, which
take the form of a sum of l_2 and l_infinity-norms over groups of variables.
They extend the classical group-sparsity regularization in the sense that the
groups possibly overlap, allowing more flexibility in the group design. We
review efficient optimization methods to deal with the corresponding inverse
problems, and their application to the problem of learning dictionaries of
natural image patches: On the one hand, dictionary learning has indeed proven
effective for various signal processing tasks. On the other hand, structured
sparsity provides a natural framework for modeling dependencies between
dictionary elements. We thus consider a structured sparse regularization to
learn dictionaries embedded in a particular structure, for instance a tree or a
two-dimensional grid. In the latter case, the results we obtain are similar to
the dictionaries produced by topographic independent component analysis