Search CORE

19,008 research outputs found

Taking Advantage of Sparsity in Multi-Task Learning

Author: Lounici Karim
Pontil Massimiliano
Tsybakov Alexandre B.
van de Geer Sara
Publication venue
Publication date: 01/01/2009
Field of study

We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in the multi-task learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

Author: Kim Seyoung
Xing Eric P.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/09/2012
Field of study

We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose a tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Oracle Inequalities and Optimal Inference under Group Sparsity

Author: Lounici Karim
Pontil Massimiliano
Tsybakov Alexandre B.
van de Geer Sara
Publication venue
Publication date: 11/07/2010
Field of study

We consider the problem of estimating a sparse linear regression vector

\beta^*

under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate

\beta^*

. We establish oracle inequalities for the prediction and

\ell_2

estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed

(2,p)

-norms with

1\le p\leq \infty

. When

p=\infty

, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of

\beta^*

with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and

\ell_2

estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso.Comment: 37 page

arXiv.org e-Print Archive

UCL Discovery

Hal-Diderot

HAL-Polytechnique

Knowledge-aware Complementary Product Representation Learning

Author: Koren Yehuda
Le Quoc
Papagelis Manos
Recht Benjamin
Rendle Steffen
Soboroff Ian
Wainwright Martin J
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/11/2019
Field of study

Learning product representations that reflect complementary relationship plays a central role in e-commerce recommender system. In the absence of the product relationships graph, which existing methods rely on, there is a need to detect the complementary relationships directly from noisy and sparse customer purchase activities. Furthermore, unlike simple relationships such as similarity, complementariness is asymmetric and non-transitive. Standard usage of representation learning emphasizes on only one set of embedding, which is problematic for modelling such properties of complementariness. We propose using knowledge-aware learning with dual product embedding to solve the above challenges. We encode contextual knowledge into product representation by multi-task learning, to alleviate the sparsity issue. By explicitly modelling with user bias terms, we separate the noise of customer-specific preferences from the complementariness. Furthermore, we adopt the dual embedding framework to capture the intrinsic properties of complementariness and provide geometric interpretation motivated by the classic separating hyperplane theory. Finally, we propose a Bayesian network structure that unifies all the components, which also concludes several popular models as special cases. The proposed method compares favourably to state-of-art methods, in downstream classification and recommendation tasks. We also develop an implementation that scales efficiently to a dataset with millions of items and customers

arXiv.org e-Print Archive

Crossref