218 research outputs found
Kernel Conjugate Gradient Methods with Random Projections
We propose and study kernel conjugate gradient methods (KCGM) with random
projections for least-squares regression over a separable Hilbert space.
Considering two types of random projections generated by randomized sketches
and Nystr\"{o}m subsampling, we prove optimal statistical results with respect
to variants of norms for the algorithms under a suitable stopping rule.
Particularly, our results show that if the projection dimension is proportional
to the effective dimension of the problem, KCGM with randomized sketches can
generalize optimally, while achieving a computational advantage. As a
corollary, we derive optimal rates for classic KCGM in the case that the target
function may not be in the hypothesis space, filling a theoretical gap.Comment: 43 pages, 2 figure
Generalization Properties of Doubly Stochastic Learning Algorithms
Doubly stochastic learning algorithms are scalable kernel methods that
perform very well in practice. However, their generalization properties are not
well understood and their analysis is challenging since the corresponding
learning sequence may not be in the hypothesis space induced by the kernel. In
this paper, we provide an in-depth theoretical analysis for different variants
of doubly stochastic learning algorithms within the setting of nonparametric
regression in a reproducing kernel Hilbert space and considering the square
loss. Particularly, we derive convergence results on the generalization error
for the studied algorithms either with or without an explicit penalty term. To
the best of our knowledge, the derived results for the unregularized variants
are the first of this kind, while the results for the regularized variants
improve those in the literature. The novelties in our proof are a sample error
bound that requires controlling the trace norm of a cumulative operator, and a
refined analysis of bounding initial error.Comment: 24 pages. To appear in Journal of Complexit
Generalization Properties and Implicit Regularization for Multiple Passes SGM
We study the generalization properties of stochastic gradient methods for
learning with convex loss functions and linearly parameterized functions. We
show that, in the absence of penalizations or constraints, the stability and
approximation properties of the algorithm can be controlled by tuning either
the step-size or the number of passes over the data. In this view, these
parameters can be seen to control a form of implicit regularization. Numerical
results complement the theoretical findings.Comment: 26 pages, 4 figures. To appear in ICML 201
Iterative Regularization for Learning with Convex Loss Functions
We consider the problem of supervised learning with convex loss functions and
propose a new form of iterative regularization based on the subgradient method.
Unlike other regularization approaches, in iterative regularization no
constraint or penalization is considered, and generalization is achieved by
(early) stopping an empirical iteration. We consider a nonparametric setting,
in the framework of reproducing kernel Hilbert spaces, and prove finite sample
bounds on the excess risk under general regularity conditions. Our study
provides a new class of efficient regularized learning algorithms and gives
insights on the interplay between statistics and optimization in machine
learning
Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces
In this paper, we study regression problems over a separable Hilbert space
with the square loss, covering non-parametric regression over a reproducing
kernel Hilbert space. We investigate a class of spectral-regularized
algorithms, including ridge regression, principal component analysis, and
gradient methods. We prove optimal, high-probability convergence results in
terms of variants of norms for the studied algorithms, considering a capacity
assumption on the hypothesis space and a general source condition on the target
function. Consequently, we obtain almost sure convergence results with optimal
rates. Our results improve and generalize previous results, filling a
theoretical gap for the non-attainable cases
- …