9,428 research outputs found
Curvature and complexity: Better lower bounds for geodesically convex optimization
We study the query complexity of geodesically convex (g-convex) optimization
on a manifold. To isolate the effect of that manifold's curvature, we primarily
focus on hyperbolic spaces. In a variety of settings (smooth or not; strongly
g-convex or not; high- or low-dimensional), known upper bounds worsen with
curvature. It is natural to ask whether this is warranted, or an artifact.
For many such settings, we propose a first set of lower bounds which indeed
confirm that (negative) curvature is detrimental to complexity. To do so, we
build on recent lower bounds (Hamilton and Moitra, 2021; Criscitiello and
Boumal, 2022) for the particular case of smooth, strongly g-convex
optimization. Using a number of techniques, we also secure lower bounds which
capture dependence on condition number and optimality gap, which was not
previously the case.
We suspect these bounds are not optimal. We conjecture optimal ones, and
support them with a matching lower bound for a class of algorithms which
includes subgradient descent, and a lower bound for a related game. Lastly, to
pinpoint the difficulty of proving lower bounds, we study how negative
curvature influences (and sometimes obstructs) interpolation with g-convex
functions.Comment: v1 to v2: Renamed the method of Rusciano 2019 from "center-of-gravity
method" to "centerpoint method
Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles
In this paper, we provide a novel framework for the analysis of
generalization error of first-order optimization algorithms for statistical
learning when the gradient can only be accessed through partial observations
given by an oracle. Our analysis relies on the regularity of the gradient
w.r.t. the data samples, and allows to derive near matching upper and lower
bounds for the generalization error of multiple learning problems, including
supervised learning, transfer learning, robust learning, distributed learning
and communication efficient learning using gradient quantization. These results
hold for smooth and strongly-convex optimization problems, as well as smooth
non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In
particular, our upper and lower bounds depend on a novel quantity that extends
the notion of conditional standard deviation, and is a measure of the extent to
which the gradient can be approximated by having access to the oracle. As a
consequence, our analysis provides a precise meaning to the intuition that
optimization of the statistical learning objective is as hard as the estimation
of its gradient. Finally, we show that, in the case of standard supervised
learning, mini-batch gradient descent with increasing batch sizes and a warm
start can reach a generalization error that is optimal up to a multiplicative
factor, thus motivating the use of this optimization scheme in practical
applications.Comment: 18 pages, 0 figure
- …