45 research outputs found
Frameworks for Learning from Multiple Tasks
In this thesis we study different machine learning frameworks for learning multiple
tasks together. Depending on the motivations and goals of each learning framework
we investigate their computational and statistical properties from both a theoretical
and experimental standpoint.
The first problem we tackle is low rank matrix learning which is a popular model
assumption used in MTL. Trace norm regularization is a widely used approach for
learning such models. A standard optimization strategy is based on formulating
the problem as one of low rank matrix factorization which, however, leads to a
non-convex problem. We show that it is possible to characterize the critical points of
the non-convex problem. This allows us to provide an efficient criterion to determine
whether a critical point is also a global minimizer. We extend this analysis to the case
in which the objective is nonsmooth.
The goal of the second problem we worked on is to infer a learning algorithm that
works well on a class of tasks sampled from an unknown meta-distribution. As
an extension of MTL our goal here is to train on a set of tasks and perform well
on future, unseen tasks. We consider a scenario in which the tasks are presented
sequentially, without keeping any of their information in memory. We study the
statistical properties of that proposed algorithm and prove non-asymptotic bounds
for the excess transfer risk.
Lastly, a common practice in ML is concatenating many different datasets and applying a learning algorithm on this new dataset. However, training on a collection of
heterogeneous datasets can cause issues due to the presence of bias. In this thesis we
derive a MTL framework that can jointly learn subcategories within a dataset and
undo the inherent bias existing within each of them
Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction
We study the interplay between surrogate methods for structured prediction
and techniques from multitask learning designed to leverage relationships
between surrogate outputs. We propose an efficient algorithm based on trace
norm regularization which, differently from previous methods, does not require
explicit knowledge of the coding/decoding functions of the surrogate framework.
As a result, our algorithm can be applied to the broad class of problems in
which the surrogate space is large or even infinite dimensional. We study
excess risk bounds for trace norm regularized structured prediction, implying
the consistency and learning rates for our estimator. We also identify relevant
regimes in which our approach can enjoy better generalization performance than
previous methods. Numerical experiments on ranking problems indicate that
enforcing low-rank relations among surrogate outputs may indeed provide a
significant advantage in practice.Comment: 42 pages, 1 tabl
Online Parameter-Free Learning of Multiple Low Variance Tasks
We propose a method to learn a common bias vector for a growing sequence of
low-variance tasks. Unlike state-of-the-art approaches, our method does not
require tuning any hyper-parameter. Our approach is presented in the
non-statistical setting and can be of two variants. The "aggressive" one
updates the bias after each datapoint, the "lazy" one updates the bias only at
the end of each task. We derive an across-tasks regret bound for the method.
When compared to state-of-the-art approaches, the aggressive variant returns
faster rates, the lazy one recovers standard rates, but with no need of tuning
hyper-parameters. We then adapt the methods to the statistical setting: the
aggressive variant becomes a multi-task learning method, the lazy one a
meta-learning method. Experiments confirm the effectiveness of our methods in
practice
Incremental Learning-to-Learn with Statistical Guarantees
In learning-to-learn the goal is to infer a learning algorithm that works
well on a class of tasks sampled from an unknown meta distribution. In contrast
to previous work on batch learning-to-learn, we consider a scenario where tasks
are presented sequentially and the algorithm needs to adapt incrementally to
improve its performance on future tasks. Key to this setting is for the
algorithm to rapidly incorporate new observations into the model as they
arrive, without keeping them in memory. We focus on the case where the
underlying algorithm is ridge regression parameterized by a positive
semidefinite matrix. We propose to learn this matrix by applying a stochastic
strategy to minimize the empirical error incurred by ridge regression on future
tasks sampled from the meta distribution. We study the statistical properties
of the proposed algorithm and prove non-asymptotic bounds on its excess
transfer risk, that is, the generalization performance on new tasks from the
same meta distribution. We compare our online learning-to-learn approach with a
state of the art batch method, both theoretically and empirically
A meta-learning BCI for estimating decision confidence
Objective: We investigated whether a recently introduced transfer-
learning technique based on meta-learning could improve the performance of Brain-Computer Interfaces (BCIs) for decision-confidence prediction with respect to more traditional machine learning methods.
Approach: We adapted the meta-learning by biased regularisation algorithm to the problem of predicting decision confidence from EEG and EOG data on a decision-by-decision basis in a difficult target discrimination task based on video feeds. The method exploits previous participants’ data to produce a prediction algorithm that is then quickly tuned to new participants. We compared it with with the traditional single-subject training almost universally adopted in BCIs, a state-of-the-art transfer
learning technique called Domain Adversarial Neural Networks (DANN), a transfer-learning adaptation of a zero-training method we used recently for a similar task, and with a simple baseline algorithm.
Main results: The meta-learning approach was significantly better than other approaches in most conditions, and much better in situations where limited data from a new participant are available for training/tuning. Meta-learning by biased regularisation allowed our BCI to seamlessly integrate information from past participants with data from a specific user to produce high-performance predictors. Its robustness in the presence of small training sets is a real-plus in BCI applications, as new users need to train the BCI for a much shorter period.
Significance: Due to the variability and noise of EEG/EOG data, BCIs need to be normally trained with data from a specific participant. This work shows that even better performance can be obtained using our version of meta-learning by biased regularisation