11 research outputs found
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
Online Variance Reduction for Stochastic Optimization
Modern stochastic optimization methods often rely on uniform sampling which
is agnostic to the underlying characteristics of the data. This might degrade
the convergence by yielding estimates that suffer from a high variance. A
possible remedy is to employ non-uniform importance sampling techniques, which
take the structure of the dataset into account. In this work, we investigate a
recently proposed setting which poses variance reduction as an online
optimization problem with bandit feedback. We devise a novel and efficient
algorithm for this setting that finds a sequence of importance sampling
distributions competitive with the best fixed distribution in hindsight, the
first result of this kind. While we present our method for sampling datapoints,
it naturally extends to selecting coordinates or even blocks of thereof.
Empirical validations underline the benefits of our method in several settings.Comment: COLT 201
Enhanced Federated Optimization: Adaptive Unbiased Sampling with Reduced Variance
Federated Learning (FL) is a distributed learning paradigm to train a global
model across multiple devices without collecting local data. In FL, a server
typically selects a subset of clients for each training round to optimize
resource usage. Central to this process is the technique of unbiased client
sampling, which ensures a representative selection of clients. Current methods
primarily utilize a random sampling procedure which, despite its effectiveness,
achieves suboptimal efficiency owing to the loose upper bound caused by the
sampling variance. In this work, by adopting an independent sampling procedure,
we propose a federated optimization framework focused on adaptive unbiased
client sampling, improving the convergence rate via an online variance
reduction strategy. In particular, we present the first adaptive client
sampler, K-Vib, employing an independent sampling procedure. K-Vib achieves a
linear speed-up on the regret bound
within a set communication budget . Empirical studies indicate that K-Vib
doubles the speed compared to baseline algorithms, demonstrating significant
potential in federated optimization.Comment: Under revie
Multi-Resolution Hashing for Fast Pairwise Summations
A basic computational primitive in the analysis of massive datasets is
summing simple functions over a large number of objects. Modern applications
pose an additional challenge in that such functions often depend on a parameter
vector (query) that is unknown a priori. Given a set of points and a pairwise function , we study the problem of designing a data-structure
that enables sublinear-time approximation of the summation
for any query . By combining ideas from Harmonic Analysis (partitions of unity
and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis
FOCS'17], we provide a general framework for designing such data structures
through hashing that reaches far beyond what previous techniques allowed.
A key design principle is a collection of hashing schemes with
collision probabilities such that . This leads to a data-structure
that approximates using a sub-linear number of samples from each
hash family. Using this new framework along with Distance Sensitive Hashing
[Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection
can be constructed and evaluated efficiently for any log-convex function
of the inner product on the unit sphere
.
Our method leads to data structures with sub-linear query time that
significantly improve upon random sampling and can be used for Kernel Density
or Partition Function Estimation. We provide extensions of our result from the
sphere to and from scalar functions to vector functions.Comment: 39 pages, 3 figure
Stochastic gradient Markov chain Monte Carlo
Markov chain Monte Carlo (MCMC) algorithms are generally regarded as the gold standard technique for Bayesian inference. They are theoretically well-understood and conceptually simple to apply in practice. The drawback of MCMC is that performing exact inference generally requires all of the data to be processed at each iteration of the algorithm. For large data sets, the computational cost of MCMC can be prohibitive, which has led to recent developments in scalable Monte Carlo algorithms that have a significantly lower computational cost than standard MCMC. In this paper, we focus on a particular class of scalable Monte Carlo algorithms, stochastic gradient Markov chain Monte Carlo (SGMCMC) which utilises data subsampling techniques to reduce the per iteration cost of MCMC. We provide an introduction to some popular SGMCMC algorithms and review the supporting theoretical results, as well as comparing the efficiency of SGMCMC algorithms against MCMC on benchmark examples. The supporting R code is available online at https://github.com/chris-nemeth/sgmcmc-review-paper