5,942 research outputs found
Pareto-Path Multi-Task Multiple Kernel Learning
A traditional and intuitively appealing Multi-Task Multiple Kernel Learning
(MT-MKL) method is to optimize the sum (thus, the average) of objective
functions with (partially) shared kernel function, which allows information
sharing amongst tasks. We point out that the obtained solution corresponds to a
single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO)
problem, which considers the concurrent optimization of all task objectives
involved in the Multi-Task Learning (MTL) problem. Motivated by this last
observation and arguing that the former approach is heuristic, we propose a
novel Support Vector Machine (SVM) MT-MKL framework, that considers an
implicitly-defined set of conic combinations of task objectives. We show that
solving our framework produces solutions along a path on the aforementioned PF
and that it subsumes the optimization of the average of objective functions as
a special case. Using algorithms we derived, we demonstrate through a series of
experimental results that the framework is capable of achieving better
classification performance, when compared to other similar MTL approaches.Comment: Accepted by IEEE Transactions on Neural Networks and Learning System
Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning
We show a Talagrand-type concentration inequality for Multi-Task Learning
(MTL), using which we establish sharp excess risk bounds for MTL in terms of
distribution- and data-dependent versions of the Local Rademacher Complexity
(LRC). We also give a new bound on the LRC for norm regularized as well as
strongly convex hypothesis classes, which applies not only to MTL but also to
the standard i.i.d. setting. Combining both results, one can now easily derive
fast-rate bounds on the excess risk for many prominent MTL methods,
including---as we demonstrate---Schatten-norm, group-norm, and
graph-regularized MTL. The derived bounds reflect a relationship akeen to a
conservation law of asymptotic convergence rates. This very relationship allows
for trading off slower rates w.r.t. the number of tasks for faster rates with
respect to the number of available samples per task, when compared to the rates
obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version)
have been corrected, or modifie
- …