1,859 research outputs found
ReSQueing Parallel and Private Stochastic Convex Optimization
We introduce a new tool for stochastic convex optimization (SCO): a
Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function
convolved with a (Gaussian) probability density. Combining ReSQue with recent
advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop
algorithms achieving state-of-the-art complexities for SCO in parallel and
private settings. For a SCO objective constrained to the unit ball in
, we obtain the following results (up to polylogarithmic
factors). We give a parallel algorithm obtaining optimization error
with gradient
oracle query depth and gradient queries in total, assuming access to a
bounded-variance stochastic gradient estimator. For , our algorithm matches the state-of-the-art oracle depth of
[BJLLS19] while maintaining the optimal total work of stochastic gradient
descent. Given samples of Lipschitz loss functions, prior works [BFTT19,
BFGT20, AFKT21, KLL21] established that if , -differential
privacy is attained at no asymptotic cost to the SCO utility. However, these
prior works all required a superlinear number of gradient queries. We close
this gap for sufficiently large , by
using ReSQue to design an algorithm with near-linear gradient query complexity
in this regime
From Averaging to Acceleration, There is Only a Step-size
We show that accelerated gradient descent, averaged gradient descent and the
heavy-ball method for non-strongly-convex problems may be reformulated as
constant parameter second-order difference equation algorithms, where stability
of the system is equivalent to convergence at rate O(1/n 2), where n is the
number of iterations. We provide a detailed analysis of the eigenvalues of the
corresponding linear dynamical system , showing various oscillatory and
non-oscillatory behaviors, together with a sharp stability result with explicit
constants. We also consider the situation where noisy gradients are available,
where we extend our general convergence result, which suggests an alternative
algorithm (i.e., with different step sizes) that exhibits the good aspects of
both averaging and acceleration
- …