9 research outputs found

    Heterogeneous feature space based task selection machine for unsupervised transfer learning

    Full text link
    © 2015 IEEE. Transfer learning techniques try to transfer knowledge from previous tasks to a new target task with either fewer training data or less training than traditional machine learning techniques. Since transfer learning cares more about relatedness between tasks and their domains, it is useful for handling massive data, which are not labeled, to overcome distribution and feature space gaps, respectively. In this paper, we propose a new task selection algorithm in an unsupervised transfer learning domain, called as Task Selection Machine (TSM). It goes with a key technical problem, i.e., feature mapping for heterogeneous feature spaces. An extended feature method is applied to feature mapping algorithm. Also, TSM training algorithm, which is main contribution for this paper, relies on feature mapping. Meanwhile, the proposed TSM finally meets the unsupervised transfer learning requirements and solves the unsupervised multi-task transfer learning issues conversely

    Conic Multi-Task Classification

    Full text link
    Traditionally, Multi-task Learning (MTL) models optimize the average of task-related objective functions, which is an intuitive approach and which we will be referring to as Average MTL. However, a more general framework, referred to as Conic MTL, can be formulated by considering conic combinations of the objective functions instead; in this framework, Average MTL arises as a special case, when all combination coefficients equal 1. Although the advantage of Conic MTL over Average MTL has been shown experimentally in previous works, no theoretical justification has been provided to date. In this paper, we derive a generalization bound for the Conic MTL method, and demonstrate that the tightest bound is not necessarily achieved, when all combination coefficients equal 1; hence, Average MTL may not always be the optimal choice, and it is important to consider Conic MTL. As a byproduct of the generalization bound, it also theoretically explains the good experimental results of previous relevant works. Finally, we propose a new Conic MTL model, whose conic combination coefficients minimize the generalization bound, instead of choosing them heuristically as has been done in previous methods. The rationale and advantage of our model is demonstrated and verified via a series of experiments by comparing with several other methods.Comment: Accepted by European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)-201

    Improved Multi-Task Learning Based on Local Rademacher Analysis

    Get PDF
    Considering a single prediction task at a time is the most commonly paradigm in machine learning practice. This methodology, however, ignores the potentially relevant information that might be available in other related tasks in the same domain. This becomes even more critical where facing the lack of a sufficient amount of data in a prediction task of an individual subject may lead to deteriorated generalization performance. In such cases, learning multiple related tasks together might offer a better performance by allowing tasks to leverage information from each other. Multi-Task Learning (MTL) is a machine learning framework, which learns multiple related tasks simultaneously to overcome data scarcity limitations of Single Task Learning (STL), and therefore, it results in an improved performance. Although MTL has been actively investigated by the machine learning community, there are only a few studies examining the theoretical justification of this learning framework. The focus of previous studies is on providing learning guarantees in the form of generalization error bounds. The study of generalization bounds is considered as an important problem in machine learning, and, more specifically, in statistical learning theory. This importance is twofold: (1) generalization bounds provide an upper-tail confidence interval for the true risk of a learning algorithm the latter of which cannot be precisely calculated due to its dependency to some unknown distribution P from which the data are drawn, (2) this type of bounds can also be employed as model selection tools, which lead to identifying more accurate learning models. The generalization error bounds are typically expressed in terms of the empirical risk of the learning hypothesis along with a complexity measure of that hypothesis. Although different complexity measures can be used in deriving error bounds, Rademacher complexity has received considerable attention in recent years, due to its superiority to other complexity measures. In fact, Rademacher complexity can potentially lead to tighter error bounds compared to the ones obtained by other complexity measures. However, one shortcoming of the general notion of Rademacher complexity is that it provides a global complexity estimate of the learning hypothesis space, which does not take into consideration the fact that learning algorithms, by design, select functions belonging to a more favorable subset of this space and, therefore, they yield better performing models than the worst case. To overcome the limitation of global Rademacher complexity, a more nuanced notion of Rademacher complexity, the so-called local Rademacher complexity, has been considered, which leads to sharper learning bounds, and as such, compared to its global counterpart, guarantees faster convergence rates in terms of number of samples. Also, considering the fact that locally-derived bounds are expected to be tighter than globally-derived ones, they can motivate better (more accurate) model selection algorithms. While the previous MTL studies provide generalization bounds based on some other complexity measures, in this dissertation, we prove excess risk bounds for some popular kernel-based MTL hypothesis spaces based on the Local Rademacher Complexity (LRC) of those hypotheses. We show that these local bounds have faster convergence rates compared to the previous Global Rademacher Complexity (GRC)-based bounds. We then use our LRC-based MTL bounds to design a new kernel-based MTL model, which enjoys strong learning guarantees. Moreover, we develop an optimization algorithm to solve our new MTL formulation. Finally, we run simulations on experimental data that compare our MTL model to some classical Multi-Task Multiple Kernel Learning (MT-MKL) models designed based on the GRCs. Since the local Rademacher complexities are expected to be tighter than the global ones, our new model is also expected to exhibit better performance compared to the GRC-based models

    On Kernel-base Multi-Task Learning

    Get PDF
    Multi-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the opportunity it offers to yield a kernel-based methods and its capability in promoting sparse feature representations. However, certain limitations exist in both theoretical and practical aspects of Regularization-Loss-based MTL. Theoretically, previous research on generalization bounds in connection to MTL Hypothesis Space (HS)s, where data of all tasks are pre-processed by a (partially) common operator, has been limited in two aspects: First, all previous works assumed linearity of the operator, therefore completely excluding kernel-based MTL HSs, for which the operator is potentially non-linear. Secondly, all previous works, rather unnecessarily, assumed that all the task weights to be constrained within norm-balls, whose radii are equal. The requirement of equal radii leads to significant inflexibility of the relevant HSs, which may cause the generalization performance of the corresponding MTL models to deteriorate. Practically, various algorithms have been developed for kernel-based MTL models, due to different characteristics of the formulations. Most of these algorithms are a burden to develop and end up being quite sophisticated, so that practitioners may face a hard task in interpreting and implementing them, especially when multiple models are involved. This is even more so, when Multi-Task Multiple Kernel Learning (MT-MKL) models are considered. This research largely resolves the above limitations. Theoretically, a pair of new kernel-based HSs are proposed: one for single-kernel MTL, and another one for MT-MKL. Unlike previous works, we allow each task weight to be constrained within a norm-ball, whose radius is learned during training. By deriving and analyzing the generalization bounds of these two HSs, we show that, indeed, such a flexibility leads to much tighter generalization bounds, which often results to significantly better generalization performance. Based on this observation, a pair of new models is developed, one for each case: single-kernel MTL, and another one for MT-MKL. From a practical perspective, we propose a general MT-MKL framework that covers most of the prominent MT-MKL approaches, including our new MT-MKL formulation. Then, a general purpose algorithm is developed to solve the framework, which can also be employed for training all other models subsumed by this framework. A series of experiments is conducted to assess the merits of the proposed mode when trained by the new algorithm. Certain properties of our HSs and formulations are demonstrated, and the advantage of our model in terms of classification accuracy is shown via these experiments

    Investigation of Multi-dimensional Tensor Multi-task Learning for Modeling Alzheimer's Disease Progression

    Get PDF
    Machine learning (ML) techniques for predicting Alzheimer's disease (AD) progression can significantly assist clinicians and researchers in constructing effective AD prevention and treatment strategies. The main constraints on the performance of current ML approaches are prediction accuracy and stability problems in medical small dataset scenarios, monotonic data formats (loss of multi-dimensional knowledge of the data and loss of correlation knowledge between biomarkers) and biomarker interpretability limitations. This thesis investigates how multi-dimensional information and knowledge from biomarker data integrated with multi-task learning approaches to predict AD progression. Firstly, a novel similarity-based quantification approach is proposed with two components: multi-dimensional knowledge vector construction and amalgamated magnitude-direction quantification of brain structural variation, which considers both the magnitude and directional correlations of structural variation between brain biomarkers and encodes the quantified data as a third-order tensor to address the problem of monotonic data form. Secondly, multi-task learning regression algorithms with the ability to integrate multi-dimensional tensor data and mine MRI data for spatio-temporal structural variation information and knowledge were designed and constructed to improve the accuracy, stability and interpretability of AD progression prediction in medical small dataset scenarios. The algorithm consists of three components: supervised symmetric tensor decomposition for extracting biomarker latent factors, tensor multi-task learning regression and algorithmic regularisation terms. The proposed algorithm aims to extract a set of first-order latent factors from the raw data, each represented by its first biomarker, second biomarker and patient sample dimensions, to elucidate potential factors affecting the variability of the data in an interpretable manner and can be utilised as predictor variables for training the prediction model that regards the prediction of each patient as a task, with each task sharing a set of biomarker latent factors obtained from tensor decomposition. Knowledge sharing between tasks improves the generalisation ability of the model and addresses the problem of sparse medical data. The experimental results demonstrate that the proposed approach achieves superior accuracy and stability in predicting various cognitive scores of AD progression compared to single-task learning, benchmarks and state-of-the-art multi-task regression methods. The proposed approach identifies brain structural variations in patients and the important brain biomarker correlations revealed by the experiments can be utilised as potential indicators for AD early identification
    corecore