94,662 research outputs found
On sparse representations and new meta-learning paradigms for representation learning
Given the "right" representation, learning is easy. This thesis studies representation learning and meta-learning, with a special focus on sparse representations. Meta-learning is fundamental to machine learning, and it translates to learning to learn itself. The presentation unfolds in two parts. In the first part, we establish learning theoretic results for learning sparse representations. The second part introduces new multi-task and meta-learning paradigms for representation learning.
On the sparse representations front, our main pursuits are generalization error bounds to support a supervised dictionary learning model for Lasso-style sparse coding. Such predictive sparse coding algorithms have been applied with much success in the literature; even more common have been applications of unsupervised sparse coding followed by supervised linear hypothesis learning. We present two generalization error bounds for predictive sparse coding, handling the overcomplete setting (more original dimensions than learned features) and the infinite-dimensional setting. Our analysis led to a fundamental stability result for the Lasso that shows the stability of the solution vector to design matrix perturbations. We also introduce and analyze new multi-task models for (unsupervised) sparse coding and predictive sparse coding, allowing for one dictionary per task but with sharing between the tasks' dictionaries.
The second part introduces new meta-learning paradigms to realize unprecedented types of learning guarantees for meta-learning. Specifically sought are guarantees on a meta-learner's performance on new tasks encountered in an environment of tasks. Nearly all previous work produced bounds on the expected risk, whereas we produce tail bounds on the risk, thereby providing performance guarantees on the risk for a single new task drawn from the environment. The new paradigms include minimax multi-task learning (minimax MTL) and sample variance penalized meta-learning (SVP-ML). Regarding minimax MTL, we provide a high probability learning guarantee on its performance on individual tasks encountered in the future, the first of its kind. We also present two continua of meta-learning formulations, each interpolating between classical multi-task learning and minimax multi-task learning. The idea of SVP-ML is to minimize the task average of the training tasks' empirical risks plus a penalty on their sample variance. Controlling this sample variance can potentially yield a faster rate of decrease for upper bounds on the expected risk of new tasks, while also yielding high probability guarantees on the meta-learner's average performance over a draw of new test tasks. An algorithm is presented for SVP-ML with feature selection representations, as well as a quite natural convex relaxation of the SVP-ML objective.Ph.D
Taking Advantage of Sparsity in Multi-Task Learning
We study the problem of estimating multiple linear regression equations for
the purpose of both prediction and variable selection. Following recent work on
multi-task learning Argyriou et al. [2008], we assume that the regression
vectors share the same sparsity pattern. This means that the set of relevant
predictor variables is the same across the different equations. This assumption
leads us to consider the Group Lasso as a candidate estimation method. We show
that this estimator enjoys nice sparsity oracle inequalities and variable
selection properties. The results hold under a certain restricted eigenvalue
condition and a coherence condition on the design matrix, which naturally
extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in
the multi-task learning scenario, in which the number of tasks can grow, we are
able to remove completely the effect of the number of predictor variables in
the bounds. Finally, we show how our results can be extended to more general
noise distributions, of which we only require the variance to be finite
Invariant Causal Prediction for Block MDPs
Generalization across environments is critical to the successful application
of reinforcement learning algorithms to real-world challenges. In this paper,
we consider the problem of learning abstractions that generalize in block MDPs,
families of environments with a shared latent state space and dynamics
structure over that latent space, but varying observations. We leverage tools
from causal inference to propose a method of invariant prediction to learn
model-irrelevance state abstractions (MISA) that generalize to novel
observations in the multi-environment setting. We prove that for certain
classes of environments, this approach outputs with high probability a state
abstraction corresponding to the causal feature set with respect to the return.
We further provide more general bounds on model error and generalization error
in the multi-environment setting, in the process showing a connection between
causal variable selection and the state abstraction framework for MDPs. We give
empirical evidence that our methods work in both linear and nonlinear settings,
attaining improved generalization over single- and multi-task baselines.Comment: Accepted to ICML 2020. 16 pages, 8 figure
Estimation of high-dimensional low-rank matrices
Suppose that we observe entries or, more generally, linear combinations of
entries of an unknown -matrix corrupted by noise. We are
particularly interested in the high-dimensional setting where the number
of unknown entries can be much larger than the sample size . Motivated by
several applications, we consider estimation of matrix under the assumption
that it has small rank. This can be viewed as dimension reduction or sparsity
assumption. In order to shrink toward a low-rank representation, we investigate
penalized least squares estimators with a Schatten- quasi-norm penalty term,
. We study these estimators under two possible assumptions---a modified
version of the restricted isometry condition and a uniform bound on the ratio
"empirical norm induced by the sampling operator/Frobenius norm." The main
results are stated as nonasymptotic upper bounds on the prediction risk and on
the Schatten- risk of the estimators, where . The rates that we
obtain for the prediction risk are of the form (for ), up to
logarithmic factors, where is the rank of . The particular examples of
multi-task learning and matrix completion are worked out in detail. The proofs
are based on tools from the theory of empirical processes. As a by-product, we
derive bounds for the th entropy numbers of the quasi-convex Schatten class
embeddings , , which are of independent
interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Benefit of Multitask Representation Learning
We discuss a general method to learn data representations from multiple
tasks. We provide a justification for this method in both settings of multitask
learning and learning-to-learn. The method is illustrated in detail in the
special case of linear feature learning. Conditions on the theoretical
advantage offered by multitask representation learning over independent task
learning are established. In particular, focusing on the important example of
half-space learning, we derive the regime in which multitask representation
learning is beneficial over independent task learning, as a function of the
sample size, the number of tasks and the intrinsic data dimensionality. Other
potential applications of our results include multitask feature learning in
reproducing kernel Hilbert spaces and multilayer, deep networks.Comment: To appear in Journal of Machine Learning Research (JMLR). 31 page
- …