45 research outputs found

    Frameworks for Learning from Multiple Tasks

    Get PDF
    In this thesis we study different machine learning frameworks for learning multiple tasks together. Depending on the motivations and goals of each learning framework we investigate their computational and statistical properties from both a theoretical and experimental standpoint. The first problem we tackle is low rank matrix learning which is a popular model assumption used in MTL. Trace norm regularization is a widely used approach for learning such models. A standard optimization strategy is based on formulating the problem as one of low rank matrix factorization which, however, leads to a non-convex problem. We show that it is possible to characterize the critical points of the non-convex problem. This allows us to provide an efficient criterion to determine whether a critical point is also a global minimizer. We extend this analysis to the case in which the objective is nonsmooth. The goal of the second problem we worked on is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta-distribution. As an extension of MTL our goal here is to train on a set of tasks and perform well on future, unseen tasks. We consider a scenario in which the tasks are presented sequentially, without keeping any of their information in memory. We study the statistical properties of that proposed algorithm and prove non-asymptotic bounds for the excess transfer risk. Lastly, a common practice in ML is concatenating many different datasets and applying a learning algorithm on this new dataset. However, training on a collection of heterogeneous datasets can cause issues due to the presence of bias. In this thesis we derive a MTL framework that can jointly learn subcategories within a dataset and undo the inherent bias existing within each of them

    Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction

    Get PDF
    We study the interplay between surrogate methods for structured prediction and techniques from multitask learning designed to leverage relationships between surrogate outputs. We propose an efficient algorithm based on trace norm regularization which, differently from previous methods, does not require explicit knowledge of the coding/decoding functions of the surrogate framework. As a result, our algorithm can be applied to the broad class of problems in which the surrogate space is large or even infinite dimensional. We study excess risk bounds for trace norm regularized structured prediction, implying the consistency and learning rates for our estimator. We also identify relevant regimes in which our approach can enjoy better generalization performance than previous methods. Numerical experiments on ranking problems indicate that enforcing low-rank relations among surrogate outputs may indeed provide a significant advantage in practice.Comment: 42 pages, 1 tabl

    Online Parameter-Free Learning of Multiple Low Variance Tasks

    Get PDF
    We propose a method to learn a common bias vector for a growing sequence of low-variance tasks. Unlike state-of-the-art approaches, our method does not require tuning any hyper-parameter. Our approach is presented in the non-statistical setting and can be of two variants. The "aggressive" one updates the bias after each datapoint, the "lazy" one updates the bias only at the end of each task. We derive an across-tasks regret bound for the method. When compared to state-of-the-art approaches, the aggressive variant returns faster rates, the lazy one recovers standard rates, but with no need of tuning hyper-parameters. We then adapt the methods to the statistical setting: the aggressive variant becomes a multi-task learning method, the lazy one a meta-learning method. Experiments confirm the effectiveness of our methods in practice

    Incremental Learning-to-Learn with Statistical Guarantees

    Get PDF
    In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta distribution. In contrast to previous work on batch learning-to-learn, we consider a scenario where tasks are presented sequentially and the algorithm needs to adapt incrementally to improve its performance on future tasks. Key to this setting is for the algorithm to rapidly incorporate new observations into the model as they arrive, without keeping them in memory. We focus on the case where the underlying algorithm is ridge regression parameterized by a positive semidefinite matrix. We propose to learn this matrix by applying a stochastic strategy to minimize the empirical error incurred by ridge regression on future tasks sampled from the meta distribution. We study the statistical properties of the proposed algorithm and prove non-asymptotic bounds on its excess transfer risk, that is, the generalization performance on new tasks from the same meta distribution. We compare our online learning-to-learn approach with a state of the art batch method, both theoretically and empirically

    A meta-learning BCI for estimating decision confidence

    Get PDF
    Objective: We investigated whether a recently introduced transfer- learning technique based on meta-learning could improve the performance of Brain-Computer Interfaces (BCIs) for decision-confidence prediction with respect to more traditional machine learning methods. Approach: We adapted the meta-learning by biased regularisation algorithm to the problem of predicting decision confidence from EEG and EOG data on a decision-by-decision basis in a difficult target discrimination task based on video feeds. The method exploits previous participants’ data to produce a prediction algorithm that is then quickly tuned to new participants. We compared it with with the traditional single-subject training almost universally adopted in BCIs, a state-of-the-art transfer learning technique called Domain Adversarial Neural Networks (DANN), a transfer-learning adaptation of a zero-training method we used recently for a similar task, and with a simple baseline algorithm. Main results: The meta-learning approach was significantly better than other approaches in most conditions, and much better in situations where limited data from a new participant are available for training/tuning. Meta-learning by biased regularisation allowed our BCI to seamlessly integrate information from past participants with data from a specific user to produce high-performance predictors. Its robustness in the presence of small training sets is a real-plus in BCI applications, as new users need to train the BCI for a much shorter period. Significance: Due to the variability and noise of EEG/EOG data, BCIs need to be normally trained with data from a specific participant. This work shows that even better performance can be obtained using our version of meta-learning by biased regularisation
    corecore