85 research outputs found
The Benefit of Multitask Representation Learning
We discuss a general method to learn data representations from multiple
tasks. We provide a justification for this method in both settings of multitask
learning and learning-to-learn. The method is illustrated in detail in the
special case of linear feature learning. Conditions on the theoretical
advantage offered by multitask representation learning over independent task
learning are established. In particular, focusing on the important example of
half-space learning, we derive the regime in which multitask representation
learning is beneficial over independent task learning, as a function of the
sample size, the number of tasks and the intrinsic data dimensionality. Other
potential applications of our results include multitask feature learning in
reproducing kernel Hilbert spaces and multilayer, deep networks.Comment: To appear in Journal of Machine Learning Research (JMLR). 31 page
Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning
We show a Talagrand-type concentration inequality for Multi-Task Learning
(MTL), using which we establish sharp excess risk bounds for MTL in terms of
distribution- and data-dependent versions of the Local Rademacher Complexity
(LRC). We also give a new bound on the LRC for norm regularized as well as
strongly convex hypothesis classes, which applies not only to MTL but also to
the standard i.i.d. setting. Combining both results, one can now easily derive
fast-rate bounds on the excess risk for many prominent MTL methods,
including---as we demonstrate---Schatten-norm, group-norm, and
graph-regularized MTL. The derived bounds reflect a relationship akeen to a
conservation law of asymptotic convergence rates. This very relationship allows
for trading off slower rates w.r.t. the number of tasks for faster rates with
respect to the number of available samples per task, when compared to the rates
obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version)
have been corrected, or modifie
Frameworks for Learning from Multiple Tasks
In this thesis we study different machine learning frameworks for learning multiple
tasks together. Depending on the motivations and goals of each learning framework
we investigate their computational and statistical properties from both a theoretical
and experimental standpoint.
The first problem we tackle is low rank matrix learning which is a popular model
assumption used in MTL. Trace norm regularization is a widely used approach for
learning such models. A standard optimization strategy is based on formulating
the problem as one of low rank matrix factorization which, however, leads to a
non-convex problem. We show that it is possible to characterize the critical points of
the non-convex problem. This allows us to provide an efficient criterion to determine
whether a critical point is also a global minimizer. We extend this analysis to the case
in which the objective is nonsmooth.
The goal of the second problem we worked on is to infer a learning algorithm that
works well on a class of tasks sampled from an unknown meta-distribution. As
an extension of MTL our goal here is to train on a set of tasks and perform well
on future, unseen tasks. We consider a scenario in which the tasks are presented
sequentially, without keeping any of their information in memory. We study the
statistical properties of that proposed algorithm and prove non-asymptotic bounds
for the excess transfer risk.
Lastly, a common practice in ML is concatenating many different datasets and applying a learning algorithm on this new dataset. However, training on a collection of
heterogeneous datasets can cause issues due to the presence of bias. In this thesis we
derive a MTL framework that can jointly learn subcategories within a dataset and
undo the inherent bias existing within each of them
Improved Multi-Task Learning Based on Local Rademacher Analysis
Considering a single prediction task at a time is the most commonly paradigm in machine learning practice. This methodology, however, ignores the potentially relevant information that might be available in other related tasks in the same domain. This becomes even more critical where facing the lack of a sufficient amount of data in a prediction task of an individual subject may lead to deteriorated generalization performance. In such cases, learning multiple related tasks together might offer a better performance by allowing tasks to leverage information from each other. Multi-Task Learning (MTL) is a machine learning framework, which learns multiple related tasks simultaneously to overcome data scarcity limitations of Single Task Learning (STL), and therefore, it results in an improved performance. Although MTL has been actively investigated by the machine learning community, there are only a few studies examining the theoretical justification of this learning framework. The focus of previous studies is on providing learning guarantees in the form of generalization error bounds. The study of generalization bounds is considered as an important problem in machine learning, and, more specifically, in statistical learning theory. This importance is twofold: (1) generalization bounds provide an upper-tail confidence interval for the true risk of a learning algorithm the latter of which cannot be precisely calculated due to its dependency to some unknown distribution P from which the data are drawn, (2) this type of bounds can also be employed as model selection tools, which lead to identifying more accurate learning models. The generalization error bounds are typically expressed in terms of the empirical risk of the learning hypothesis along with a complexity measure of that hypothesis. Although different complexity measures can be used in deriving error bounds, Rademacher complexity has received considerable attention in recent years, due to its superiority to other complexity measures. In fact, Rademacher complexity can potentially lead to tighter error bounds compared to the ones obtained by other complexity measures. However, one shortcoming of the general notion of Rademacher complexity is that it provides a global complexity estimate of the learning hypothesis space, which does not take into consideration the fact that learning algorithms, by design, select functions belonging to a more favorable subset of this space and, therefore, they yield better performing models than the worst case. To overcome the limitation of global Rademacher complexity, a more nuanced notion of Rademacher complexity, the so-called local Rademacher complexity, has been considered, which leads to sharper learning bounds, and as such, compared to its global counterpart, guarantees faster convergence rates in terms of number of samples. Also, considering the fact that locally-derived bounds are expected to be tighter than globally-derived ones, they can motivate better (more accurate) model selection algorithms. While the previous MTL studies provide generalization bounds based on some other complexity measures, in this dissertation, we prove excess risk bounds for some popular kernel-based MTL hypothesis spaces based on the Local Rademacher Complexity (LRC) of those hypotheses. We show that these local bounds have faster convergence rates compared to the previous Global Rademacher Complexity (GRC)-based bounds. We then use our LRC-based MTL bounds to design a new kernel-based MTL model, which enjoys strong learning guarantees. Moreover, we develop an optimization algorithm to solve our new MTL formulation. Finally, we run simulations on experimental data that compare our MTL model to some classical Multi-Task Multiple Kernel Learning (MT-MKL) models designed based on the GRCs. Since the local Rademacher complexities are expected to be tighter than the global ones, our new model is also expected to exhibit better performance compared to the GRC-based models
Conic Multi-Task Classification
Traditionally, Multi-task Learning (MTL) models optimize the average of
task-related objective functions, which is an intuitive approach and which we
will be referring to as Average MTL. However, a more general framework,
referred to as Conic MTL, can be formulated by considering conic combinations
of the objective functions instead; in this framework, Average MTL arises as a
special case, when all combination coefficients equal 1. Although the advantage
of Conic MTL over Average MTL has been shown experimentally in previous works,
no theoretical justification has been provided to date. In this paper, we
derive a generalization bound for the Conic MTL method, and demonstrate that
the tightest bound is not necessarily achieved, when all combination
coefficients equal 1; hence, Average MTL may not always be the optimal choice,
and it is important to consider Conic MTL. As a byproduct of the generalization
bound, it also theoretically explains the good experimental results of previous
relevant works. Finally, we propose a new Conic MTL model, whose conic
combination coefficients minimize the generalization bound, instead of choosing
them heuristically as has been done in previous methods. The rationale and
advantage of our model is demonstrated and verified via a series of experiments
by comparing with several other methods.Comment: Accepted by European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECMLPKDD)-201
Algorithm-dependent generalization bounds for multi-task learning
Often, tasks are collected for multi-task learning (MTL) because they share
similar feature structures. Based on this observation, in this paper, we present
novel algorithm-dependent generalization bounds for MTL by exploiting the notion
of algorithmic stability. We focus on the performance of one particular task
and the average performance over multiple tasks by analyzing the generalization
1
ability of a common parameter that is shared in MTL. When focusing on one
particular task, with the help of a mild assumption on the feature structures, we
interpret the function of the other tasks as a regularizer that produces a specific
inductive bias. The algorithm for learning the common parameter, as well as the
predictor, is thereby uniformly stable with respect to the domain of the particular
task and has a generalization bound with a fast convergence rate of order O(1=n),
where n is the sample size of the particular task. When focusing on the average
performance over multiple tasks, we prove that a similar inductive bias exists under
certain conditions on the feature structures. Thus, the corresponding algorithm
for learning the common parameter is also uniformly stable with respect to the domains
of the multiple tasks, and its generalization bound is of the order O(1=T ),
where T is the number of tasks. These theoretical analyses naturally show that
the similarity of feature structures in MTL will lead to specific regularizations for
predicting, which enables the learning algorithms to generalize fast and correctly
from a few examples
Multitask and transfer learning for multi-aspect data
Supervised learning aims to learn functional relationships between inputs and outputs. Multitask learning tackles supervised learning tasks by performing them simultaneously to exploit commonalities between them. In this thesis, we focus on the problem of eliminating negative transfer in order to achieve better performance in multitask learning. We start by considering a general scenario in which the relationship between tasks is unknown. We then narrow our analysis to the case where data are characterised by a combination of underlying aspects, e.g., a dataset of images of faces, where each face is determined by a person's facial structure, the emotion being expressed, and the lighting conditions. In machine learning there have been numerous efforts based on multilinear models to decouple these aspects but these have primarily used techniques from the field of unsupervised learning. In this thesis we take inspiration from these approaches and hypothesize that supervised learning methods can also benefit from exploiting these aspects. The contributions of this thesis are as follows: 1. A multitask learning and transfer learning method that avoids negative transfer when there is no prescribed information about the relationships between tasks. 2. A multitask learning approach that takes advantage of a lack of overlapping features between known groups of tasks associated with different aspects. 3. A framework which extends multitask learning using multilinear algebra, with the aim of learning tasks associated with a combination of elements from different aspects. 4. A novel convex relaxation approach that can be applied both to the suggested framework and more generally to any tensor recovery problem. Through theoretical validation and experiments on both synthetic and real-world datasets, we show that the proposed approaches allow fast and reliable inferences. Furthermore, when performing learning tasks on an aspect of interest, accounting for secondary aspects leads to significantly more accurate results than using traditional approaches
Tensor Regression
Regression analysis is a key area of interest in the field of data analysis
and machine learning which is devoted to exploring the dependencies between
variables, often using vectors. The emergence of high dimensional data in
technologies such as neuroimaging, computer vision, climatology and social
networks, has brought challenges to traditional data representation methods.
Tensors, as high dimensional extensions of vectors, are considered as natural
representations of high dimensional data. In this book, the authors provide a
systematic study and analysis of tensor-based regression models and their
applications in recent years. It groups and illustrates the existing
tensor-based regression methods and covers the basics, core ideas, and
theoretical characteristics of most tensor-based regression methods. In
addition, readers can learn how to use existing tensor-based regression methods
to solve specific regression tasks with multiway data, what datasets can be
selected, and what software packages are available to start related work as
soon as possible. Tensor Regression is the first thorough overview of the
fundamentals, motivations, popular algorithms, strategies for efficient
implementation, related applications, available datasets, and software
resources for tensor-based regression analysis. It is essential reading for all
students, researchers and practitioners of working on high dimensional data.Comment: 187 pages, 32 figures, 10 table
- …