Search CORE

12,905 research outputs found

A Survey on Multi-Task Learning

Author: Yang Qiang
Zhang Yu
Publication venue
Publication date: 26/07/2018
Field of study

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL

arXiv.org e-Print Archive

Learning Multiple Tasks with Multilinear Relationship Networks

Author: Cao Zhangjie
Long Mingsheng
Wang Jianmin
Yu Philip S.
Publication venue
Publication date: 06/11/2017
Field of study

Deep networks trained on large-scale data can learn transferable features to promote learning multiple tasks. Since deep features eventually transition from general to specific along deep networks, a fundamental problem of multi-task learning is how to exploit the task relatedness underlying parameter tensors and improve feature transferability in the multiple task-specific layers. This paper presents Multilinear Relationship Networks (MRN) that discover the task relationships based on novel tensor normal priors over parameter tensors of multiple task-specific layers in deep convolutional networks. By jointly learning transferable features and multilinear relationships of tasks and features, MRN is able to alleviate the dilemma of negative-transfer in the feature layers and under-transfer in the classifier layer. Experiments show that MRN yields state-of-the-art results on three multi-task learning datasets.Comment: NIPS 201

arXiv.org e-Print Archive

Robust Online Multi-Task Learning with Correlative and Personalized Structures

Author: Gao Xin
Yang Peng
Zhao Peilin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/06/2017
Field of study

Multi-Task Learning (MTL) can enhance a classifier's generalization performance by learning multiple related tasks simultaneously. Conventional MTL works under the offline or batch setting, and suffers from expensive training cost and poor scalability. To address such inefficiency issues, online learning techniques have been applied to solve MTL problems. However, most existing algorithms of online MTL constrain task relatedness into a presumed structure via a single weight matrix, which is a strict restriction that does not always hold in practice. In this paper, we propose a robust online MTL framework that overcomes this restriction by decomposing the weight matrix into two components: the first one captures the low-rank common structure among tasks via a nuclear norm and the second one identifies the personalized patterns of outlier tasks via a group lasso. Theoretical analysis shows the proposed algorithm can achieve a sub-linear regret with respect to the best linear model in hindsight. Even though the above framework achieves good performance, the nuclear norm that simply adds all nonzero singular values together may not be a good low-rank approximation. To improve the results, we use a log-determinant function as a non-convex rank approximation. The gradient scheme is applied to optimize log-determinant function and can obtain a closed-form solution for this refined problem. Experimental results on a number of real-world applications verify the efficacy of our method

arXiv.org e-Print Archive

Decentralized Multi-Task Learning Based on Extreme Learning Machines

Author: Skoglund Mikael
Xiao Ming
Ye Yu
Publication venue
Publication date: 25/04/2019
Field of study

In multi-task learning (MTL), related tasks learn jointly to improve generalization performance. To exploit the high learning speed of extreme learning machines (ELMs), we apply the ELM framework to the MTL problem, where the output weights of ELMs for all the tasks are learned collaboratively. We first present the ELM based MTL problem in the centralized setting, which is solved by the proposed MTL-ELM algorithm. Due to the fact that many data sets of different tasks are geo-distributed, decentralized machine learning is studied. We formulate the decentralized MTL problem based on ELM as majorized multi-block optimization with coupled bi-convex objective functions. To solve the problem, we propose the DMTL-ELM algorithm, which is a hybrid Jacobian and Gauss-Seidel Proximal multi-block alternating direction method of multipliers (ADMM). Further, to reduce the computation load of DMTL-ELM, DMTL-ELM with first-order approximation (FO-DMTL-ELM) is presented. Theoretical analysis shows that the convergence to the stationary point of DMTL-ELM and FO-DMTL-ELM can be guaranteed conditionally. Through simulations, we demonstrate the convergence of proposed MTL-ELM, DMTL-ELM, and FO-DMTL-ELM algorithms, and also show that they can outperform existing MTL methods. Moreover, by adjusting the dimension of hidden feature space, there exists a trade-off between communication load and learning accuracy for DMTL-ELM

arXiv.org e-Print Archive

Learning to Multitask

Author: Wei Ying
Yang Qiang
Zhang Yu
Publication venue
Publication date: 19/05/2018
Field of study

Multitask learning has shown promising performance in many applications and many multitask models have been proposed. In order to identify an effective multitask model for a given multitask problem, we propose a learning framework called learning to multitask (L2MT). To achieve the goal, L2MT exploits historical multitask experience which is organized as a training set consists of several tuples, each of which contains a multitask problem with multiple tasks, a multitask model, and the relative test error. Based on such training set, L2MT first uses a proposed layerwise graph neural network to learn task embeddings for all the tasks in a multitask problem and then learns an estimation function to estimate the relative test error based on task embeddings and the representation of the multitask model based on a unified formulation. Given a new multitask problem, the estimation function is used to identify a suitable multitask model. Experiments on benchmark datasets show the effectiveness of the proposed L2MT framework

arXiv.org e-Print Archive

Simultaneous Parameter Learning and Bi-Clustering for Multi-Response Models

Author: Lozano Aurélie
Ramamurthy Karthikeyan Natesan
Thompson Addie
Yu Ming
Publication venue
Publication date: 29/04/2018
Field of study

We consider multi-response and multitask regression models, where the parameter matrix to be estimated is expected to have an unknown grouping structure. The groupings can be along tasks, or features, or both, the last one indicating a bi-cluster or "checkerboard" structure. Discovering this grouping structure along with parameter inference makes sense in several applications, such as multi-response Genome-Wide Association Studies. This additional structure can not only can be leveraged for more accurate parameter estimation, but it also provides valuable information on the underlying data mechanisms (e.g. relationships among genotypes and phenotypes in GWAS). In this paper, we propose two formulations to simultaneously learn the parameter matrix and its group structures, based on convex regularization penalties. We present optimization approaches to solve the resulting problems and provide numerical convergence guarantees. Our approaches are validated on extensive simulations and real datasets concerning phenotypes and genotypes of plant varieties.Comment: 15 pages, 15 figure

arXiv.org e-Print Archive

Multi-stage Multi-task feature learning via adaptive threshold

Author: Fan Yaru
Wang Yilun
Publication venue
Publication date: 02/06/2015
Field of study

Multi-task feature learning aims to identity the shared features among tasks to improve generalization. It has been shown that by minimizing non-convex learning models, a better solution than the convex alternatives can be obtained. Therefore, a non-convex model based on the capped-

\ell_{1},\ell_{1}

regularization was proposed in \cite{Gong2013}, and a corresponding efficient multi-stage multi-task feature learning algorithm (MSMTFL) was presented. However, this algorithm harnesses a prescribed fixed threshold in the definition of the capped-

\ell_{1},\ell_{1}

regularization and the lack of adaptivity might result in suboptimal performance. In this paper we propose to employ an adaptive threshold in the capped-

\ell_{1},\ell_{1}

regularized formulation, where the corresponding variant of MSMTFL will incorporate an additional component to adaptively determine the threshold value. This variant is expected to achieve a better feature selection performance over the original MSMTFL algorithm. In particular, the embedded adaptive threshold component comes from our previously proposed iterative support detection (ISD) method \cite{Wang2010}. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of this new variant over the original MSMTFL.Comment: 13 pages,12 figures. arXiv admin note: text overlap with arXiv:1210.5806 by other author

arXiv.org e-Print Archive

Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties

Author: Lozano Aurélie C.
Ramamurthy Karthikeyan Natesan
Thompson Addie M.
Yang Eunho
Yu Ming
Publication venue
Publication date: 04/10/2017
Field of study

Inferring predictive maps between multiple input and multiple output variables or tasks has innumerable applications in data science. Multi-task learning attempts to learn the maps to several output tasks simultaneously with information sharing between them. We propose a novel multi-task learning framework for sparse linear regression, where a full task hierarchy is automatically inferred from the data, with the assumption that the task parameters follow a hierarchical tree structure. The leaves of the tree are the parameters for individual tasks, and the root is the global model that approximates all the tasks. We apply the proposed approach to develop and evaluate: (a) predictive models of plant traits using large-scale and automated remote sensing data, and (b) GWAS methodologies mapping such derived phenotypes in lieu of hand-measured traits. We demonstrate the superior performance of our approach compared to other methods, as well as the usefulness of discovering hierarchical groupings between tasks. Our results suggest that richer genetic mapping can indeed be obtained from the remote sensing data. In addition, our discovered groupings reveal interesting insights from a plant science perspective

arXiv.org e-Print Archive

Multi-Stage Multi-Task Feature Learning

Author: Gong Pinghua
Ye Jieping
Zhang Changshui
Publication venue
Publication date: 22/10/2012
Field of study

Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an

\ell_0

-type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel non-convex regularizer. To solve the non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature Learning (MSMTFL) algorithm; we also provide intuitive interpretations, detailed convergence and reproducibility analysis for the proposed algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms.Comment: The short version appears in NIPS 201

arXiv.org e-Print Archive

Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso

Author: Carbonell Jaime G.
Chen Xi
Kim Seyoung
Lin Qihang
Xing Eric P.
Publication venue
Publication date: 19/05/2010
Field of study

We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as l1/l2-regularized multi-task regression assume that all of the output variables are equally related to the inputs, although in many real-world problems, outputs are related in a complex manner. In this paper, we propose graph-guided fused lasso (GFlasso) for structured multi-task regression that exploits the graph structure over the output variables. We introduce a novel penalty function based on fusion penalty to encourage highly correlated outputs to share a common set of relevant inputs. In addition, we propose a simple yet efficient proximal-gradient method for optimizing GFlasso that can also be applied to any optimization problems with a convex smooth loss and the general class of fusion penalty defined on arbitrary graph structures. By exploiting the structure of the non-smooth ''fusion penalty'', our method achieves a faster convergence rate than the standard first-order method, sub-gradient method, and is significantly more scalable than the widely adopted second-order cone-programming and quadratic-programming formulations. In addition, we provide an analysis of the consistency property of the GFlasso model. Experimental results not only demonstrate the superiority of GFlasso over the standard lasso but also show the efficiency and scalability of our proximal-gradient method.Comment: 21 pages, 7 figure

arXiv.org e-Print Archive