262 research outputs found
Generalization Properties of Doubly Stochastic Learning Algorithms
Doubly stochastic learning algorithms are scalable kernel methods that
perform very well in practice. However, their generalization properties are not
well understood and their analysis is challenging since the corresponding
learning sequence may not be in the hypothesis space induced by the kernel. In
this paper, we provide an in-depth theoretical analysis for different variants
of doubly stochastic learning algorithms within the setting of nonparametric
regression in a reproducing kernel Hilbert space and considering the square
loss. Particularly, we derive convergence results on the generalization error
for the studied algorithms either with or without an explicit penalty term. To
the best of our knowledge, the derived results for the unregularized variants
are the first of this kind, while the results for the regularized variants
improve those in the literature. The novelties in our proof are a sample error
bound that requires controlling the trace norm of a cumulative operator, and a
refined analysis of bounding initial error.Comment: 24 pages. To appear in Journal of Complexit
Learning Probability Measures with respect to Optimal Transport Metrics
We study the problem of estimating, in the sense of optimal transport
metrics, a measure which is assumed supported on a manifold embedded in a
Hilbert space. By establishing a precise connection between optimal transport
metrics, optimal quantization, and learning theory, we derive new probabilistic
bounds for the performance of a classic algorithm in unsupervised learning
(k-means), when used to produce a probability measure derived from the data. In
the course of the analysis, we arrive at new lower bounds, as well as
probabilistic upper bounds on the convergence rate of the empirical law of
large numbers, which, unlike existing bounds, are applicable to a wide class of
measures.Comment: 13 pages, 2 figures. Advances in Neural Information Processing
Systems, NIPS 201
Less is More: Nystr\"om Computational Regularization
We study Nystr\"om type subsampling approaches to large scale kernel methods,
and prove learning bounds in the statistical learning setting, where random
sampling and high probability estimates are considered. In particular, we prove
that these approaches can achieve optimal learning bounds, provided the
subsampling level is suitably chosen. These results suggest a simple
incremental variant of Nystr\"om Kernel Regularized Least Squares, where the
subsampling level implements a form of computational regularization, in the
sense that it controls at the same time regularization and computations.
Extensive experimental analysis shows that the considered approach achieves
state of the art performances on benchmark large scale datasets.Comment: updated version of NIPS 2015 (oral
A Consistent Regularization Approach for Structured Prediction
We propose and analyze a regularization approach for structured prediction
problems. We characterize a large class of loss functions that allows to
naturally embed structured outputs in a linear space. We exploit this fact to
design learning algorithms using a surrogate loss approach and regularization
techniques. We prove universal consistency and finite sample bounds
characterizing the generalization properties of the proposed methods.
Experimental results are provided to demonstrate the practical usefulness of
the proposed approach.Comment: 39 pages, 2 Tables, 1 Figur
Generalization Properties and Implicit Regularization for Multiple Passes SGM
We study the generalization properties of stochastic gradient methods for
learning with convex loss functions and linearly parameterized functions. We
show that, in the absence of penalizations or constraints, the stability and
approximation properties of the algorithm can be controlled by tuning either
the step-size or the number of passes over the data. In this view, these
parameters can be seen to control a form of implicit regularization. Numerical
results complement the theoretical findings.Comment: 26 pages, 4 figures. To appear in ICML 201
Learning Multiple Visual Tasks while Discovering their Structure
Multi-task learning is a natural approach for computer vision applications
that require the simultaneous solution of several distinct but related
problems, e.g. object detection, classification, tracking of multiple agents,
or denoising, to name a few. The key idea is that exploring task relatedness
(structure) can lead to improved performances.
In this paper, we propose and study a novel sparse, non-parametric approach
exploiting the theory of Reproducing Kernel Hilbert Spaces for vector-valued
functions. We develop a suitable regularization framework which can be
formulated as a convex optimization problem, and is provably solvable using an
alternating minimization approach. Empirical tests show that the proposed
method compares favorably to state of the art techniques and further allows to
recover interpretable structures, a problem of interest in its own right.Comment: 19 pages, 3 figures, 3 table
- …