19,008 research outputs found
Taking Advantage of Sparsity in Multi-Task Learning
We study the problem of estimating multiple linear regression equations for
the purpose of both prediction and variable selection. Following recent work on
multi-task learning Argyriou et al. [2008], we assume that the regression
vectors share the same sparsity pattern. This means that the set of relevant
predictor variables is the same across the different equations. This assumption
leads us to consider the Group Lasso as a candidate estimation method. We show
that this estimator enjoys nice sparsity oracle inequalities and variable
selection properties. The results hold under a certain restricted eigenvalue
condition and a coherence condition on the design matrix, which naturally
extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in
the multi-task learning scenario, in which the number of tasks can grow, we are
able to remove completely the effect of the number of predictor variables in
the bounds. Finally, we show how our results can be extended to more general
noise distributions, of which we only require the variance to be finite
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Oracle Inequalities and Optimal Inference under Group Sparsity
We consider the problem of estimating a sparse linear regression vector
under a gaussian noise model, for the purpose of both prediction and
model selection. We assume that prior knowledge is available on the sparsity
pattern, namely the set of variables is partitioned into prescribed groups,
only few of which are relevant in the estimation process. This group sparsity
assumption suggests us to consider the Group Lasso method as a means to
estimate . We establish oracle inequalities for the prediction and
estimation errors of this estimator. These bounds hold under a
restricted eigenvalue condition on the design matrix. Under a stronger
coherence condition, we derive bounds for the estimation error for mixed
-norms with . When , this result implies
that a threshold version of the Group Lasso estimator selects the sparsity
pattern of with high probability. Next, we prove that the rate of
convergence of our upper bounds is optimal in a minimax sense, up to a
logarithmic factor, for all estimators over a class of group sparse vectors.
Furthermore, we establish lower bounds for the prediction and
estimation errors of the usual Lasso estimator. Using this result, we
demonstrate that the Group Lasso can achieve an improvement in the prediction
and estimation properties as compared to the Lasso.Comment: 37 page
Knowledge-aware Complementary Product Representation Learning
Learning product representations that reflect complementary relationship
plays a central role in e-commerce recommender system. In the absence of the
product relationships graph, which existing methods rely on, there is a need to
detect the complementary relationships directly from noisy and sparse customer
purchase activities. Furthermore, unlike simple relationships such as
similarity, complementariness is asymmetric and non-transitive. Standard usage
of representation learning emphasizes on only one set of embedding, which is
problematic for modelling such properties of complementariness. We propose
using knowledge-aware learning with dual product embedding to solve the above
challenges. We encode contextual knowledge into product representation by
multi-task learning, to alleviate the sparsity issue. By explicitly modelling
with user bias terms, we separate the noise of customer-specific preferences
from the complementariness. Furthermore, we adopt the dual embedding framework
to capture the intrinsic properties of complementariness and provide geometric
interpretation motivated by the classic separating hyperplane theory. Finally,
we propose a Bayesian network structure that unifies all the components, which
also concludes several popular models as special cases. The proposed method
compares favourably to state-of-art methods, in downstream classification and
recommendation tasks. We also develop an implementation that scales efficiently
to a dataset with millions of items and customers
- …