117 research outputs found
A Consistent Regularization Approach for Structured Prediction
We propose and analyze a regularization approach for structured prediction
problems. We characterize a large class of loss functions that allows to
naturally embed structured outputs in a linear space. We exploit this fact to
design learning algorithms using a surrogate loss approach and regularization
techniques. We prove universal consistency and finite sample bounds
characterizing the generalization properties of the proposed methods.
Experimental results are provided to demonstrate the practical usefulness of
the proposed approach.Comment: 39 pages, 2 Tables, 1 Figur
Learning Multiple Visual Tasks while Discovering their Structure
Multi-task learning is a natural approach for computer vision applications
that require the simultaneous solution of several distinct but related
problems, e.g. object detection, classification, tracking of multiple agents,
or denoising, to name a few. The key idea is that exploring task relatedness
(structure) can lead to improved performances.
In this paper, we propose and study a novel sparse, non-parametric approach
exploiting the theory of Reproducing Kernel Hilbert Spaces for vector-valued
functions. We develop a suitable regularization framework which can be
formulated as a convex optimization problem, and is provably solvable using an
alternating minimization approach. Empirical tests show that the proposed
method compares favorably to state of the art techniques and further allows to
recover interpretable structures, a problem of interest in its own right.Comment: 19 pages, 3 figures, 3 table
Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm
We present a novel algorithm to estimate the barycenter of arbitrary
probability distributions with respect to the Sinkhorn divergence. Based on a
Frank-Wolfe optimization strategy, our approach proceeds by populating the
support of the barycenter incrementally, without requiring any pre-allocation.
We consider discrete as well as continuous distributions, proving convergence
rates of the proposed algorithm in both settings. Key elements of our analysis
are a new result showing that the Sinkhorn divergence on compact domains has
Lipschitz continuous gradient with respect to the Total Variation and a
characterization of the sample complexity of Sinkhorn potentials. Experiments
validate the effectiveness of our method in practice.Comment: 46 pages, 8 figure
Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
We study the problem of learning-to-learn: inferring a learning algorithm
that works well on tasks sampled from an unknown distribution. As class of
algorithms we consider Stochastic Gradient Descent on the true risk regularized
by the square euclidean distance to a bias vector. We present an average excess
risk bound for such a learning algorithm. This result quantifies the potential
benefit of using a bias vector with respect to the unbiased case. We then
address the problem of estimating the bias from a sequence of tasks. We propose
a meta-algorithm which incrementally updates the bias, as new tasks are
observed. The low space and time complexity of this approach makes it appealing
in practice. We provide guarantees on the learning ability of the
meta-algorithm. A key feature of our results is that, when the number of tasks
grows and their variance is relatively small, our learning-to-learn approach
has a significant advantage over learning each task in isolation by Stochastic
Gradient Descent without a bias term. We report on numerical experiments which
demonstrate the effectiveness of our approach.Comment: 37 pages, 8 figure
Consistent Multitask Learning with Nonlinear Output Relations
Key to multitask learning is exploiting relationships between different tasks
to improve prediction performance. If the relations are linear, regularization
approaches can be used successfully. However, in practice assuming the tasks to
be linearly related might be restrictive, and allowing for nonlinear structures
is a challenge. In this paper, we tackle this issue by casting the problem
within the framework of structured prediction. Our main contribution is a novel
algorithm for learning multiple tasks which are related by a system of
nonlinear equations that their joint outputs need to satisfy. We show that the
algorithm is consistent and can be efficiently implemented. Experimental
results show the potential of the proposed method.Comment: 25 pages, 1 figure, 2 table
Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction
We study the interplay between surrogate methods for structured prediction
and techniques from multitask learning designed to leverage relationships
between surrogate outputs. We propose an efficient algorithm based on trace
norm regularization which, differently from previous methods, does not require
explicit knowledge of the coding/decoding functions of the surrogate framework.
As a result, our algorithm can be applied to the broad class of problems in
which the surrogate space is large or even infinite dimensional. We study
excess risk bounds for trace norm regularized structured prediction, implying
the consistency and learning rates for our estimator. We also identify relevant
regimes in which our approach can enjoy better generalization performance than
previous methods. Numerical experiments on ranking problems indicate that
enforcing low-rank relations among surrogate outputs may indeed provide a
significant advantage in practice.Comment: 42 pages, 1 tabl
Convex Learning of Multiple Tasks and their Structure
Reducing the amount of human supervision is a key problem in machine learning
and a natural approach is that of exploiting the relations (structure) among
different tasks. This is the idea at the core of multi-task learning. In this
context a fundamental question is how to incorporate the tasks structure in the
learning problem.We tackle this question by studying a general computational
framework that allows to encode a-priori knowledge of the tasks structure in
the form of a convex penalty; in this setting a variety of previously proposed
methods can be recovered as special cases, including linear and non-linear
approaches. Within this framework, we show that tasks and their structure can
be efficiently learned considering a convex optimization problem that can be
approached by means of block coordinate methods such as alternating
minimization and for which we prove convergence to the global minimum.Comment: 26 pages, 1 figure, 2 table
- …