Search CORE

94,662 research outputs found

On sparse representations and new meta-learning paradigms for representation learning

Author: Mehta Nishant A.
Publication venue: Georgia Institute of Technology
Publication date: 27/08/2014
Field of study

Given the "right" representation, learning is easy. This thesis studies representation learning and meta-learning, with a special focus on sparse representations. Meta-learning is fundamental to machine learning, and it translates to learning to learn itself. The presentation unfolds in two parts. In the first part, we establish learning theoretic results for learning sparse representations. The second part introduces new multi-task and meta-learning paradigms for representation learning. On the sparse representations front, our main pursuits are generalization error bounds to support a supervised dictionary learning model for Lasso-style sparse coding. Such predictive sparse coding algorithms have been applied with much success in the literature; even more common have been applications of unsupervised sparse coding followed by supervised linear hypothesis learning. We present two generalization error bounds for predictive sparse coding, handling the overcomplete setting (more original dimensions than learned features) and the infinite-dimensional setting. Our analysis led to a fundamental stability result for the Lasso that shows the stability of the solution vector to design matrix perturbations. We also introduce and analyze new multi-task models for (unsupervised) sparse coding and predictive sparse coding, allowing for one dictionary per task but with sharing between the tasks' dictionaries. The second part introduces new meta-learning paradigms to realize unprecedented types of learning guarantees for meta-learning. Specifically sought are guarantees on a meta-learner's performance on new tasks encountered in an environment of tasks. Nearly all previous work produced bounds on the expected risk, whereas we produce tail bounds on the risk, thereby providing performance guarantees on the risk for a single new task drawn from the environment. The new paradigms include minimax multi-task learning (minimax MTL) and sample variance penalized meta-learning (SVP-ML). Regarding minimax MTL, we provide a high probability learning guarantee on its performance on individual tasks encountered in the future, the first of its kind. We also present two continua of meta-learning formulations, each interpolating between classical multi-task learning and minimax multi-task learning. The idea of SVP-ML is to minimize the task average of the training tasks' empirical risks plus a penalty on their sample variance. Controlling this sample variance can potentially yield a faster rate of decrease for upper bounds on the expected risk of new tasks, while also yielding high probability guarantees on the meta-learner's average performance over a draw of new test tasks. An algorithm is presented for SVP-ML with feature selection representations, as well as a quite natural convex relaxation of the SVP-ML objective.Ph.D

Scholarly Materials And Research @ Georgia Tech

Taking Advantage of Sparsity in Multi-Task Learning

Author: Lounici Karim
Pontil Massimiliano
Tsybakov Alexandre B.
van de Geer Sara
Publication venue
Publication date: 01/01/2009
Field of study

We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in the multi-task learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Invariant Causal Prediction for Block MDPs

Author: Filos Angelos
Gal Yarin
Kwiatkowska Marta
Lyle Clare
Pineau Joelle
Precup Doina
Sodhani Shagun
Zhang Amy
Publication venue
Publication date: 01/01/2020
Field of study

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.Comment: Accepted to ICML 2020. 16 pages, 8 figure

arXiv.org e-Print Archive

Oxford University Research Archive

Estimation of high-dimensional low-rank matrices

Author: Rohde Angelika
Tsybakov Alexandre B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/05/2011
Field of study

Suppose that we observe entries or, more generally, linear combinations of entries of an unknown

m\times T

-matrix

A

corrupted by noise. We are particularly interested in the high-dimensional setting where the number

mT

of unknown entries can be much larger than the sample size

N

. Motivated by several applications, we consider estimation of matrix

A

under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-

p

quasi-norm penalty term,

p\leq1

. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio "empirical norm induced by the sampling operator/Frobenius norm." The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-

q

risk of the estimators, where

q\in[p,2]

. The rates that we obtain for the prediction risk are of the form

rm/N

(for

m=T

), up to logarithmic factors, where

r

is the rank of

A

. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the

k

th entropy numbers of the quasi-convex Schatten class embeddings

S_p^M\hookrightarrow S_2^M

p<1

, which are of independent interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

The Benefit of Multitask Representation Learning

Author: Maurer Andreas
Pontil Massimiliano
Romera-Paredes Bernardino
Publication venue
Publication date: 25/03/2016
Field of study

We discuss a general method to learn data representations from multiple tasks. We provide a justification for this method in both settings of multitask learning and learning-to-learn. The method is illustrated in detail in the special case of linear feature learning. Conditions on the theoretical advantage offered by multitask representation learning over independent task learning are established. In particular, focusing on the important example of half-space learning, we derive the regime in which multitask representation learning is beneficial over independent task learning, as a function of the sample size, the number of tasks and the intrinsic data dimensionality. Other potential applications of our results include multitask feature learning in reproducing kernel Hilbert spaces and multilayer, deep networks.Comment: To appear in Journal of Machine Learning Research (JMLR). 31 page

arXiv.org e-Print Archive

UCL Discovery