16,940 research outputs found
Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles
In this paper, we provide a novel framework for the analysis of
generalization error of first-order optimization algorithms for statistical
learning when the gradient can only be accessed through partial observations
given by an oracle. Our analysis relies on the regularity of the gradient
w.r.t. the data samples, and allows to derive near matching upper and lower
bounds for the generalization error of multiple learning problems, including
supervised learning, transfer learning, robust learning, distributed learning
and communication efficient learning using gradient quantization. These results
hold for smooth and strongly-convex optimization problems, as well as smooth
non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In
particular, our upper and lower bounds depend on a novel quantity that extends
the notion of conditional standard deviation, and is a measure of the extent to
which the gradient can be approximated by having access to the oracle. As a
consequence, our analysis provides a precise meaning to the intuition that
optimization of the statistical learning objective is as hard as the estimation
of its gradient. Finally, we show that, in the case of standard supervised
learning, mini-batch gradient descent with increasing batch sizes and a warm
start can reach a generalization error that is optimal up to a multiplicative
factor, thus motivating the use of this optimization scheme in practical
applications.Comment: 18 pages, 0 figure
A Lower Bound for the Optimization of Finite Sums
This paper presents a lower bound for optimizing a finite sum of
functions, where each function is -smooth and the sum is -strongly
convex. We show that no algorithm can reach an error in minimizing
all functions from this class in fewer than iterations, where is a
surrogate condition number. We then compare this lower bound to upper bounds
for recently developed methods specializing to this setting. When the functions
involved in this sum are not arbitrary, but based on i.i.d. random data, then
we further contrast these complexity results with those for optimal first-order
methods to directly optimize the sum. The conclusion we draw is that a lot of
caution is necessary for an accurate comparison, and identify machine learning
scenarios where the new methods help computationally.Comment: Added an erratum, we are currently working on extending the result to
randomized algorithm
Highly-Smooth Zero-th Order Online Optimization Vianney Perchet
The minimization of convex functions which are only available through partial
and noisy information is a key methodological problem in many disciplines. In
this paper we consider convex optimization with noisy zero-th order
information, that is noisy function evaluations at any desired point. We focus
on problems with high degrees of smoothness, such as logistic regression. We
show that as opposed to gradient-based algorithms, high-order smoothness may be
used to improve estimation rates, with a precise dependence of our upper-bounds
on the degree of smoothness. In particular, we show that for infinitely
differentiable functions, we recover the same dependence on sample size as
gradient-based algorithms, with an extra dimension-dependent factor. This is
done for both convex and strongly-convex functions, with finite horizon and
anytime algorithms. Finally, we also recover similar results in the online
optimization setting.Comment: Conference on Learning Theory (COLT), Jun 2016, New York, United
States. 201
- …