46,175 research outputs found
Learning Output Kernels for Multi-Task Problems
Simultaneously solving multiple related learning tasks is beneficial under a
variety of circumstances, but the prior knowledge necessary to correctly model
task relationships is rarely available in practice. In this paper, we develop a
novel kernel-based multi-task learning technique that automatically reveals
structural inter-task relationships. Building over the framework of output
kernel learning (OKL), we introduce a method that jointly learns multiple
functions and a low-rank multi-task kernel by solving a non-convex
regularization problem. Optimization is carried out via a block coordinate
descent strategy, where each subproblem is solved using suitable conjugate
gradient (CG) type iterative methods for linear operator equations. The
effectiveness of the proposed approach is demonstrated on pharmacological and
collaborative filtering data
Efficient Output Kernel Learning for Multiple Tasks
The paradigm of multi-task learning is that one can achieve better
generalization by learning tasks jointly and thus exploiting the similarity
between the tasks rather than learning them independently of each other. While
previously the relationship between tasks had to be user-defined in the form of
an output kernel, recent approaches jointly learn the tasks and the output
kernel. As the output kernel is a positive semidefinite matrix, the resulting
optimization problems are not scalable in the number of tasks as an
eigendecomposition is required in each step. \mbox{Using} the theory of
positive semidefinite kernels we show in this paper that for a certain class of
regularizers on the output kernel, the constraint of being positive
semidefinite can be dropped as it is automatically satisfied for the relaxed
problem. This leads to an unconstrained dual problem which can be solved
efficiently. Experiments on several multi-task and multi-class data sets
illustrate the efficacy of our approach in terms of computational efficiency as
well as generalization performance
Multi-view Metric Learning in Vector-valued Kernel Spaces
We consider the problem of metric learning for multi-view data and present a
novel method for learning within-view as well as between-view metrics in
vector-valued kernel spaces, as a way to capture multi-modal structure of the
data. We formulate two convex optimization problems to jointly learn the metric
and the classifier or regressor in kernel feature spaces. An iterative
three-step multi-view metric learning algorithm is derived from the
optimization problems. In order to scale the computation to large training
sets, a block-wise Nystr{\"o}m approximation of the multi-view kernel matrix is
introduced. We justify our approach theoretically and experimentally, and show
its performance on real-world datasets against relevant state-of-the-art
methods
A Convex Feature Learning Formulation for Latent Task Structure Discovery
This paper considers the multi-task learning problem and in the setting where
some relevant features could be shared across few related tasks. Most of the
existing methods assume the extent to which the given tasks are related or
share a common feature space to be known apriori. In real-world applications
however, it is desirable to automatically discover the groups of related tasks
that share a feature space. In this paper we aim at searching the exponentially
large space of all possible groups of tasks that may share a feature space. The
main contribution is a convex formulation that employs a graph-based
regularizer and simultaneously discovers few groups of related tasks, having
close-by task parameters, as well as the feature space shared within each
group. The regularizer encodes an important structure among the groups of tasks
leading to an efficient algorithm for solving it: if there is no feature space
under which a group of tasks has close-by task parameters, then there does not
exist such a feature space for any of its supersets. An efficient active set
algorithm that exploits this simplification and performs a clever search in the
exponentially large space is presented. The algorithm is guaranteed to solve
the proposed formulation (within some precision) in a time polynomial in the
number of groups of related tasks discovered. Empirical results on benchmark
datasets show that the proposed formulation achieves good generalization and
outperforms state-of-the-art multi-task learning algorithms in some cases.Comment: ICML201
Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% - 10% using a single model.Comment: To appear in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017
A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression
Many machine learning problems can be formulated as predicting labels for a
pair of objects. Problems of that kind are often referred to as pairwise
learning, dyadic prediction or network inference problems. During the last
decade kernel methods have played a dominant role in pairwise learning. They
still obtain a state-of-the-art predictive performance, but a theoretical
analysis of their behavior has been underexplored in the machine learning
literature.
In this work we review and unify existing kernel-based algorithms that are
commonly used in different pairwise learning settings, ranging from matrix
filtering to zero-shot learning. To this end, we focus on closed-form efficient
instantiations of Kronecker kernel ridge regression. We show that independent
task kernel ridge regression, two-step kernel ridge regression and a linear
matrix filter arise naturally as a special case of Kronecker kernel ridge
regression, implying that all these methods implicitly minimize a squared loss.
In addition, we analyze universality, consistency and spectral filtering
properties. Our theoretical results provide valuable insights in assessing the
advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427
- …
