58,549 research outputs found
Inferring latent task structure for Multitask Learning by Multiple Kernel Learning
<p>Abstract</p> <p>Background</p> <p>The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics, many problems can be cast into the Multitask Learning scenario by incorporating data from several organisms. However, combining information from several tasks requires careful consideration of the degree of similarity between tasks. Our proposed method simultaneously learns or refines the similarity between tasks along with the Multitask Learning classifier. This is done by formulating the Multitask Learning problem as Multiple Kernel Learning, using the recently published <it>q</it>-Norm MKL algorithm.</p> <p>Results</p> <p>We demonstrate the performance of our method on two problems from Computational Biology. First, we show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. Second, we consider an MHC-I dataset, for which we assume no knowledge about the degree of task relatedness. Here, we are able to learn the task similarities<it> ab initio</it> along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against.</p> <p>Conclusions</p> <p>We present a novel approach to Multitask Learning that is capable of learning task similarity along with the classifiers. The framework is very general as it allows to incorporate prior knowledge about tasks relationships if available, but is also able to identify task similarities in absence of such prior information. Both variants show promising results in applications from Computational Biology.</p
Pareto-Path Multi-Task Multiple Kernel Learning
A traditional and intuitively appealing Multi-Task Multiple Kernel Learning
(MT-MKL) method is to optimize the sum (thus, the average) of objective
functions with (partially) shared kernel function, which allows information
sharing amongst tasks. We point out that the obtained solution corresponds to a
single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO)
problem, which considers the concurrent optimization of all task objectives
involved in the Multi-Task Learning (MTL) problem. Motivated by this last
observation and arguing that the former approach is heuristic, we propose a
novel Support Vector Machine (SVM) MT-MKL framework, that considers an
implicitly-defined set of conic combinations of task objectives. We show that
solving our framework produces solutions along a path on the aforementioned PF
and that it subsumes the optimization of the average of objective functions as
a special case. Using algorithms we derived, we demonstrate through a series of
experimental results that the framework is capable of achieving better
classification performance, when compared to other similar MTL approaches.Comment: Accepted by IEEE Transactions on Neural Networks and Learning System
A Unifying View of Multiple Kernel Learning
Recent research on multiple kernel learning has lead to a number of
approaches for combining kernels in regularized risk minimization. The proposed
approaches include different formulations of objectives and varying
regularization strategies. In this paper we present a unifying general
optimization criterion for multiple kernel learning and show how existing
formulations are subsumed as special cases. We also derive the criterion's dual
representation, which is suitable for general smooth optimization algorithms.
Finally, we evaluate multiple kernel learning in this framework analytically
using a Rademacher complexity bound on the generalization error and empirically
in a set of experiments
Forecasting and Granger Modelling with Non-linear Dynamical Dependencies
Traditional linear methods for forecasting multivariate time series are not
able to satisfactorily model the non-linear dependencies that may exist in
non-Gaussian series. We build on the theory of learning vector-valued functions
in the reproducing kernel Hilbert space and develop a method for learning
prediction functions that accommodate such non-linearities. The method not only
learns the predictive function but also the matrix-valued kernel underlying the
function search space directly from the data. Our approach is based on learning
multiple matrix-valued kernels, each of those composed of a set of input
kernels and a set of output kernels learned in the cone of positive
semi-definite matrices. In addition to superior predictive performance in the
presence of strong non-linearities, our method also recovers the hidden dynamic
relationships between the series and thus is a new alternative to existing
graphical Granger techniques.Comment: Accepted for ECML-PKDD 201
Conic Multi-Task Classification
Traditionally, Multi-task Learning (MTL) models optimize the average of
task-related objective functions, which is an intuitive approach and which we
will be referring to as Average MTL. However, a more general framework,
referred to as Conic MTL, can be formulated by considering conic combinations
of the objective functions instead; in this framework, Average MTL arises as a
special case, when all combination coefficients equal 1. Although the advantage
of Conic MTL over Average MTL has been shown experimentally in previous works,
no theoretical justification has been provided to date. In this paper, we
derive a generalization bound for the Conic MTL method, and demonstrate that
the tightest bound is not necessarily achieved, when all combination
coefficients equal 1; hence, Average MTL may not always be the optimal choice,
and it is important to consider Conic MTL. As a byproduct of the generalization
bound, it also theoretically explains the good experimental results of previous
relevant works. Finally, we propose a new Conic MTL model, whose conic
combination coefficients minimize the generalization bound, instead of choosing
them heuristically as has been done in previous methods. The rationale and
advantage of our model is demonstrated and verified via a series of experiments
by comparing with several other methods.Comment: Accepted by European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECMLPKDD)-201
- …