4,194 research outputs found
Consistency of the group Lasso and multiple kernel learning
We consider the least-square regression problem with regularization by a
block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger
than one. This problem, referred to as the group Lasso, extends the usual
regularization by the 1-norm where all spaces have dimension one, where it is
commonly referred to as the Lasso. In this paper, we study the asymptotic model
consistency of the group Lasso. We derive necessary and sufficient conditions
for the consistency of group Lasso under practical assumptions, such as model
misspecification. When the linear predictors and Euclidean norms are replaced
by functions and reproducing kernel Hilbert norms, the problem is usually
referred to as multiple kernel learning and is commonly used for learning from
heterogeneous data sources and for non linear variable selection. Using tools
from functional analysis, and in particular covariance operators, we extend the
consistency results to this infinite dimensional case and also propose an
adaptive scheme to obtain a consistent model estimate, even when the necessary
condition required for the non adaptive scheme is not satisfied
Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
For supervised and unsupervised learning, positive definite kernels allow to
use large and potentially infinite dimensional feature spaces with a
computational cost that only depends on the number of observations. This is
usually done through the penalization of predictor functions by Euclidean or
Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing
norms such as the l1-norm or the block l1-norm. We assume that the kernel
decomposes into a large sum of individual basis kernels which can be embedded
in a directed acyclic graph; we show that it is then possible to perform kernel
selection through a hierarchical multiple kernel learning framework, in
polynomial time in the number of selected kernels. This framework is naturally
applied to non linear variable selection; our extensive simulations on
synthetic datasets and datasets from the UCI repository show that efficiently
exploring the large feature space through sparsity-inducing norms leads to
state-of-the-art predictive performance
Structured, sparse regression with application to HIV drug resistance
We introduce a new version of forward stepwise regression. Our modification
finds solutions to regression problems where the selected predictors appear in
a structured pattern, with respect to a predefined distance measure over the
candidate predictors. Our method is motivated by the problem of predicting
HIV-1 drug resistance from protein sequences. We find that our method improves
the interpretability of drug resistance while producing comparable predictive
accuracy to standard methods. We also demonstrate our method in a simulation
study and present some theoretical results and connections.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS428 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Forecasting and Granger Modelling with Non-linear Dynamical Dependencies
Traditional linear methods for forecasting multivariate time series are not
able to satisfactorily model the non-linear dependencies that may exist in
non-Gaussian series. We build on the theory of learning vector-valued functions
in the reproducing kernel Hilbert space and develop a method for learning
prediction functions that accommodate such non-linearities. The method not only
learns the predictive function but also the matrix-valued kernel underlying the
function search space directly from the data. Our approach is based on learning
multiple matrix-valued kernels, each of those composed of a set of input
kernels and a set of output kernels learned in the cone of positive
semi-definite matrices. In addition to superior predictive performance in the
presence of strong non-linearities, our method also recovers the hidden dynamic
relationships between the series and thus is a new alternative to existing
graphical Granger techniques.Comment: Accepted for ECML-PKDD 201
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
- …