Search CORE

16,940 research outputs found

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

Author: Even Mathieu
Massoulié Laurent
Scaman Kevin
Publication venue
Publication date: 11/07/2023
Field of study

In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.Comment: 18 pages, 0 figure

arXiv.org e-Print Archive

A Lower Bound for the Optimization of Finite Sums

Author: Agarwal Alekh
Bottou Leon
Publication venue
Publication date: 03/10/2015
Field of study

This paper presents a lower bound for optimizing a finite sum of

n

functions, where each function is

L

-smooth and the sum is

\mu

-strongly convex. We show that no algorithm can reach an error

\epsilon

in minimizing all functions from this class in fewer than

\Omega(n + \sqrt{n(\kappa-1)}\log(1/\epsilon))

iterations, where

\kappa=L/\mu

is a surrogate condition number. We then compare this lower bound to upper bounds for recently developed methods specializing to this setting. When the functions involved in this sum are not arbitrary, but based on i.i.d. random data, then we further contrast these complexity results with those for optimal first-order methods to directly optimize the sum. The conclusion we draw is that a lot of caution is necessary for an accurate comparison, and identify machine learning scenarios where the new methods help computationally.Comment: Added an erratum, we are currently working on extending the result to randomized algorithm

arXiv.org e-Print Archive

CiteSeerX

Highly-Smooth Zero-th Order Online Optimization Vianney Perchet

Author: Bach Francis
Perchet Vianney
Publication venue
Publication date: 26/05/2016
Field of study

The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines. In this paper we consider convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as logistic regression. We show that as opposed to gradient-based algorithms, high-order smoothness may be used to improve estimation rates, with a precise dependence of our upper-bounds on the degree of smoothness. In particular, we show that for infinitely differentiable functions, we recover the same dependence on sample size as gradient-based algorithms, with an extra dimension-dependent factor. This is done for both convex and strongly-convex functions, with finite horizon and anytime algorithms. Finally, we also recover similar results in the online optimization setting.Comment: Conference on Learning Theory (COLT), Jun 2016, New York, United States. 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique