36 research outputs found
Learning with incremental iterative regularization
Within a statistical learning setting, we propose and study an iterative regularization
algorithm for least squares defined by an incremental gradient method. In
particular, we show that, if all other parameters are fixed a priori, the number of
passes over the data (epochs) acts as a regularization parameter, and prove strong
universal consistency, i.e. almost sure convergence of the risk, as well as sharp
finite sample bounds for the iterates. Our results are a step towards understanding
the effect of multiple epochs in stochastic gradient techniques in machine learning
and rely on integrating statistical and optimization result
Learning with SGD and Random Features
Sketching and stochastic gradient methods are arguably the most common
techniques to derive efficient large scale learning algorithms. In this paper,
we investigate their application in the context of nonparametric statistical
learning. More precisely, we study the estimator defined by stochastic gradient
with mini batches and random features. The latter can be seen as form of
nonlinear sketching and used to define approximate kernel methods. The
considered estimator is not explicitly penalized/constrained and regularization
is implicit. Indeed, our study highlights how different parameters, such as
number of features, iterations, step-size and mini-batch size control the
learning properties of the solutions. We do this by deriving optimal finite
sample bounds, under standard assumptions. The obtained results are
corroborated and illustrated by numerical experiments
Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces
In this paper, we study regression problems over a separable Hilbert space
with the square loss, covering non-parametric regression over a reproducing
kernel Hilbert space. We investigate a class of spectral-regularized
algorithms, including ridge regression, principal component analysis, and
gradient methods. We prove optimal, high-probability convergence results in
terms of variants of norms for the studied algorithms, considering a capacity
assumption on the hypothesis space and a general source condition on the target
function. Consequently, we obtain almost sure convergence results with optimal
rates. Our results improve and generalize previous results, filling a
theoretical gap for the non-attainable cases
Generalization Properties and Implicit Regularization for Multiple Passes SGM
We study the generalization properties of stochastic gradient methods for
learning with convex loss functions and linearly parameterized functions. We
show that, in the absence of penalizations or constraints, the stability and
approximation properties of the algorithm can be controlled by tuning either
the step-size or the number of passes over the data. In this view, these
parameters can be seen to control a form of implicit regularization. Numerical
results complement the theoretical findings.Comment: 26 pages, 4 figures. To appear in ICML 201
Optimal Learning for Multi-pass Stochastic Gradient Methods
We analyze the learning properties of the stochastic gradient method when multiple
passes over the data and mini-batches are allowed. In particular, we consider
the square loss and show that for a universal step-size choice, the number of
passes acts as a regularization parameter, and optimal finite sample bounds can be
achieved by early-stopping. Moreover, we show that larger step-sizes are allowed
when considering mini-batches. Our analysis is based on a unifying approach,
encompassing both batch and stochastic gradient methods as special cases
Learning with SGD and Random Features
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching and used to define approximate kernel methods. The considered estimator is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments