1,604,281 research outputs found
Accelerated Gradient Methods for Networked Optimization
We develop multi-step gradient methods for network-constrained optimization
of strongly convex functions with Lipschitz-continuous gradients. Given the
topology of the underlying network and bounds on the Hessian of the objective
function, we determine the algorithm parameters that guarantee the fastest
convergence and characterize situations when significant speed-ups can be
obtained over the standard gradient method. Furthermore, we quantify how the
performance of the gradient method and its accelerated counterpart are affected
by uncertainty in the problem data, and conclude that in most cases our
proposed method outperforms gradient descent. Finally, we apply the proposed
technique to three engineering problems: resource allocation under network-wide
budget constraints, distributed averaging, and Internet congestion control. In
all cases, we demonstrate that our algorithm converges more rapidly than
alternative algorithms reported in the literature
AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods
We study a new aggregation operator for gradients coming from a mini-batch
for stochastic gradient (SG) methods that allows a significant speed-up in the
case of sparse optimization problems. We call this method AdaBatch and it only
requires a few lines of code change compared to regular mini-batch SGD
algorithms. We provide a theoretical insight to understand how this new class
of algorithms is performing and show that it is equivalent to an implicit
per-coordinate rescaling of the gradients, similarly to what Adagrad methods
can do. In theory and in practice, this new aggregation allows to keep the same
sample efficiency of SG methods while increasing the batch size.
Experimentally, we also show that in the case of smooth convex optimization,
our procedure can even obtain a better loss when increasing the batch size for
a fixed number of samples. We then apply this new algorithm to obtain a
parallelizable stochastic gradient method that is synchronous but allows
speed-up on par with Hogwild! methods as convergence does not deteriorate with
the increase of the batch size. The same approach can be used to make
mini-batch provably efficient for variance-reduced SG methods such as SVRG
Kernel Conjugate Gradient Methods with Random Projections
We propose and study kernel conjugate gradient methods (KCGM) with random
projections for least-squares regression over a separable Hilbert space.
Considering two types of random projections generated by randomized sketches
and Nystr\"{o}m subsampling, we prove optimal statistical results with respect
to variants of norms for the algorithms under a suitable stopping rule.
Particularly, our results show that if the projection dimension is proportional
to the effective dimension of the problem, KCGM with randomized sketches can
generalize optimally, while achieving a computational advantage. As a
corollary, we derive optimal rates for classic KCGM in the case that the target
function may not be in the hypothesis space, filling a theoretical gap.Comment: 43 pages, 2 figure
- …
