12,905 research outputs found
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
Learning Multiple Tasks with Multilinear Relationship Networks
Deep networks trained on large-scale data can learn transferable features to
promote learning multiple tasks. Since deep features eventually transition from
general to specific along deep networks, a fundamental problem of multi-task
learning is how to exploit the task relatedness underlying parameter tensors
and improve feature transferability in the multiple task-specific layers. This
paper presents Multilinear Relationship Networks (MRN) that discover the task
relationships based on novel tensor normal priors over parameter tensors of
multiple task-specific layers in deep convolutional networks. By jointly
learning transferable features and multilinear relationships of tasks and
features, MRN is able to alleviate the dilemma of negative-transfer in the
feature layers and under-transfer in the classifier layer. Experiments show
that MRN yields state-of-the-art results on three multi-task learning datasets.Comment: NIPS 201
Robust Online Multi-Task Learning with Correlative and Personalized Structures
Multi-Task Learning (MTL) can enhance a classifier's generalization
performance by learning multiple related tasks simultaneously. Conventional MTL
works under the offline or batch setting, and suffers from expensive training
cost and poor scalability. To address such inefficiency issues, online learning
techniques have been applied to solve MTL problems. However, most existing
algorithms of online MTL constrain task relatedness into a presumed structure
via a single weight matrix, which is a strict restriction that does not always
hold in practice. In this paper, we propose a robust online MTL framework that
overcomes this restriction by decomposing the weight matrix into two
components: the first one captures the low-rank common structure among tasks
via a nuclear norm and the second one identifies the personalized patterns of
outlier tasks via a group lasso. Theoretical analysis shows the proposed
algorithm can achieve a sub-linear regret with respect to the best linear model
in hindsight. Even though the above framework achieves good performance, the
nuclear norm that simply adds all nonzero singular values together may not be a
good low-rank approximation. To improve the results, we use a log-determinant
function as a non-convex rank approximation. The gradient scheme is applied to
optimize log-determinant function and can obtain a closed-form solution for
this refined problem. Experimental results on a number of real-world
applications verify the efficacy of our method
Decentralized Multi-Task Learning Based on Extreme Learning Machines
In multi-task learning (MTL), related tasks learn jointly to improve
generalization performance. To exploit the high learning speed of extreme
learning machines (ELMs), we apply the ELM framework to the MTL problem, where
the output weights of ELMs for all the tasks are learned collaboratively. We
first present the ELM based MTL problem in the centralized setting, which is
solved by the proposed MTL-ELM algorithm. Due to the fact that many data sets
of different tasks are geo-distributed, decentralized machine learning is
studied. We formulate the decentralized MTL problem based on ELM as majorized
multi-block optimization with coupled bi-convex objective functions. To solve
the problem, we propose the DMTL-ELM algorithm, which is a hybrid Jacobian and
Gauss-Seidel Proximal multi-block alternating direction method of multipliers
(ADMM). Further, to reduce the computation load of DMTL-ELM, DMTL-ELM with
first-order approximation (FO-DMTL-ELM) is presented. Theoretical analysis
shows that the convergence to the stationary point of DMTL-ELM and FO-DMTL-ELM
can be guaranteed conditionally. Through simulations, we demonstrate the
convergence of proposed MTL-ELM, DMTL-ELM, and FO-DMTL-ELM algorithms, and also
show that they can outperform existing MTL methods. Moreover, by adjusting the
dimension of hidden feature space, there exists a trade-off between
communication load and learning accuracy for DMTL-ELM
Learning to Multitask
Multitask learning has shown promising performance in many applications and
many multitask models have been proposed. In order to identify an effective
multitask model for a given multitask problem, we propose a learning framework
called learning to multitask (L2MT). To achieve the goal, L2MT exploits
historical multitask experience which is organized as a training set consists
of several tuples, each of which contains a multitask problem with multiple
tasks, a multitask model, and the relative test error. Based on such training
set, L2MT first uses a proposed layerwise graph neural network to learn task
embeddings for all the tasks in a multitask problem and then learns an
estimation function to estimate the relative test error based on task
embeddings and the representation of the multitask model based on a unified
formulation. Given a new multitask problem, the estimation function is used to
identify a suitable multitask model. Experiments on benchmark datasets show the
effectiveness of the proposed L2MT framework
Simultaneous Parameter Learning and Bi-Clustering for Multi-Response Models
We consider multi-response and multitask regression models, where the
parameter matrix to be estimated is expected to have an unknown grouping
structure. The groupings can be along tasks, or features, or both, the last one
indicating a bi-cluster or "checkerboard" structure. Discovering this grouping
structure along with parameter inference makes sense in several applications,
such as multi-response Genome-Wide Association Studies. This additional
structure can not only can be leveraged for more accurate parameter estimation,
but it also provides valuable information on the underlying data mechanisms
(e.g. relationships among genotypes and phenotypes in GWAS). In this paper, we
propose two formulations to simultaneously learn the parameter matrix and its
group structures, based on convex regularization penalties. We present
optimization approaches to solve the resulting problems and provide numerical
convergence guarantees. Our approaches are validated on extensive simulations
and real datasets concerning phenotypes and genotypes of plant varieties.Comment: 15 pages, 15 figure
Multi-stage Multi-task feature learning via adaptive threshold
Multi-task feature learning aims to identity the shared features among tasks
to improve generalization. It has been shown that by minimizing non-convex
learning models, a better solution than the convex alternatives can be
obtained. Therefore, a non-convex model based on the capped-
regularization was proposed in \cite{Gong2013}, and a corresponding efficient
multi-stage multi-task feature learning algorithm (MSMTFL) was presented.
However, this algorithm harnesses a prescribed fixed threshold in the
definition of the capped- regularization and the lack of
adaptivity might result in suboptimal performance. In this paper we propose to
employ an adaptive threshold in the capped- regularized
formulation, where the corresponding variant of MSMTFL will incorporate an
additional component to adaptively determine the threshold value. This variant
is expected to achieve a better feature selection performance over the original
MSMTFL algorithm. In particular, the embedded adaptive threshold component
comes from our previously proposed iterative support detection (ISD) method
\cite{Wang2010}. Empirical studies on both synthetic and real-world data sets
demonstrate the effectiveness of this new variant over the original MSMTFL.Comment: 13 pages,12 figures. arXiv admin note: text overlap with
arXiv:1210.5806 by other author
Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties
Inferring predictive maps between multiple input and multiple output
variables or tasks has innumerable applications in data science. Multi-task
learning attempts to learn the maps to several output tasks simultaneously with
information sharing between them. We propose a novel multi-task learning
framework for sparse linear regression, where a full task hierarchy is
automatically inferred from the data, with the assumption that the task
parameters follow a hierarchical tree structure. The leaves of the tree are the
parameters for individual tasks, and the root is the global model that
approximates all the tasks. We apply the proposed approach to develop and
evaluate: (a) predictive models of plant traits using large-scale and automated
remote sensing data, and (b) GWAS methodologies mapping such derived phenotypes
in lieu of hand-measured traits. We demonstrate the superior performance of our
approach compared to other methods, as well as the usefulness of discovering
hierarchical groupings between tasks. Our results suggest that richer genetic
mapping can indeed be obtained from the remote sensing data. In addition, our
discovered groupings reveal interesting insights from a plant science
perspective
Multi-Stage Multi-Task Feature Learning
Multi-task sparse feature learning aims to improve the generalization
performance by exploiting the shared features among tasks. It has been
successfully applied to many applications including computer vision and
biomedical informatics. Most of the existing multi-task sparse feature learning
algorithms are formulated as a convex sparse regularization problem, which is
usually suboptimal, due to its looseness for approximating an -type
regularizer. In this paper, we propose a non-convex formulation for multi-task
sparse feature learning based on a novel non-convex regularizer. To solve the
non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature
Learning (MSMTFL) algorithm; we also provide intuitive interpretations,
detailed convergence and reproducibility analysis for the proposed algorithm.
Moreover, we present a detailed theoretical analysis showing that MSMTFL
achieves a better parameter estimation error bound than the convex formulation.
Empirical studies on both synthetic and real-world data sets demonstrate the
effectiveness of MSMTFL in comparison with the state of the art multi-task
sparse feature learning algorithms.Comment: The short version appears in NIPS 201
Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso
We consider the problem of learning a structured multi-task regression, where
the output consists of multiple responses that are related by a graph and the
correlated response variables are dependent on the common inputs in a sparse
but synergistic manner. Previous methods such as l1/l2-regularized multi-task
regression assume that all of the output variables are equally related to the
inputs, although in many real-world problems, outputs are related in a complex
manner. In this paper, we propose graph-guided fused lasso (GFlasso) for
structured multi-task regression that exploits the graph structure over the
output variables. We introduce a novel penalty function based on fusion penalty
to encourage highly correlated outputs to share a common set of relevant
inputs. In addition, we propose a simple yet efficient proximal-gradient method
for optimizing GFlasso that can also be applied to any optimization problems
with a convex smooth loss and the general class of fusion penalty defined on
arbitrary graph structures. By exploiting the structure of the non-smooth
''fusion penalty'', our method achieves a faster convergence rate than the
standard first-order method, sub-gradient method, and is significantly more
scalable than the widely adopted second-order cone-programming and
quadratic-programming formulations. In addition, we provide an analysis of the
consistency property of the GFlasso model. Experimental results not only
demonstrate the superiority of GFlasso over the standard lasso but also show
the efficiency and scalability of our proximal-gradient method.Comment: 21 pages, 7 figure
- …