Search CORE

11 research outputs found

Coordinate Descent with Bandit Sampling

Author: Celis L. Elisa
Salehi Farnood
Thiran Patrick
Publication venue
Publication date: 04/12/2018
Field of study

Coordinate descent methods usually minimize a cost function by updating a random decision variable (corresponding to one coordinate) at a time. Ideally, we would update the decision variable that yields the largest decrease in the cost function. However, finding this coordinate would require checking all of them, which would effectively negate the improvement in computational tractability that coordinate descent is intended to afford. To address this, we propose a new adaptive method for selecting a coordinate. First, we find a lower bound on the amount the cost function decreases when a coordinate is updated. We then use a multi-armed bandit algorithm to learn which coordinates result in the largest lower bound by interleaving this learning with conventional coordinate descent updates except that the coordinate is selected proportionately to the expected decrease. We show that our approach improves the convergence of coordinate descent methods both theoretically and experimentally.Comment: appearing at NeurIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Online Variance Reduction for Stochastic Optimization

Author: Borsos Zalán
Krause Andreas
Levy Kfir Y.
Publication venue
Publication date: 01/01/2018
Field of study

Modern stochastic optimization methods often rely on uniform sampling which is agnostic to the underlying characteristics of the data. This might degrade the convergence by yielding estimates that suffer from a high variance. A possible remedy is to employ non-uniform importance sampling techniques, which take the structure of the dataset into account. In this work, we investigate a recently proposed setting which poses variance reduction as an online optimization problem with bandit feedback. We devise a novel and efficient algorithm for this setting that finds a sequence of importance sampling distributions competitive with the best fixed distribution in hindsight, the first result of this kind. While we present our method for sampling datapoints, it naturally extends to selecting coordinates or even blocks of thereof. Empirical validations underline the benefits of our method in several settings.Comment: COLT 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Enhanced Federated Optimization: Adaptive Unbiased Sampling with Reduced Variance

Author: Luo Xu
Pan Yu
Tang Xiaoying
Wang Qifan
Xu Zenglin
Zeng Dun
Publication venue
Publication date: 04/02/2024
Field of study

Federated Learning (FL) is a distributed learning paradigm to train a global model across multiple devices without collecting local data. In FL, a server typically selects a subset of clients for each training round to optimize resource usage. Central to this process is the technique of unbiased client sampling, which ensures a representative selection of clients. Current methods primarily utilize a random sampling procedure which, despite its effectiveness, achieves suboptimal efficiency owing to the loose upper bound caused by the sampling variance. In this work, by adopting an independent sampling procedure, we propose a federated optimization framework focused on adaptive unbiased client sampling, improving the convergence rate via an online variance reduction strategy. In particular, we present the first adaptive client sampler, K-Vib, employing an independent sampling procedure. K-Vib achieves a linear speed-up on the regret bound

\tilde{\mathcal{O}}\big(N^{\frac{1}{3}}T^{\frac{2}{3}}/K^{\frac{4}{3}}\big)

within a set communication budget

K

. Empirical studies indicate that K-Vib doubles the speed compared to baseline algorithms, demonstrating significant potential in federated optimization.Comment: Under revie

arXiv.org e-Print Archive

Multi-Resolution Hashing for Fast Pairwise Summations

Author: Charikar Moses
Siminelakis Paris
Publication venue
Publication date: 03/11/2018
Field of study

A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector

y

(query) that is unknown a priori. Given a set of points

X\subset \mathbb{R}^{d}

and a pairwise function

w:\mathbb{R}^{d}\times \mathbb{R}^{d}\to [0,1]

, we study the problem of designing a data-structure that enables sublinear-time approximation of the summation

Z_{w}(y)=\frac{1}{|X|}\sum_{x\in X}w(x,y)

for any query

y\in \mathbb{R}^{d}

. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is a collection of

T\geq 1

hashing schemes with collision probabilities

p_{1},\ldots, p_{T}

such that

\sup_{t\in [T]}\{p_{t}(x,y)\} = \Theta(\sqrt{w(x,y)})

. This leads to a data-structure that approximates

Z_{w}(y)

using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function

w(x,y)=e^{\phi(\langle x,y\rangle)}

of the inner product on the unit sphere

x,y\in \mathcal{S}^{d-1}

. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to

\mathbb{R}^{d}

and from scalar functions to vector functions.Comment: 39 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Stochastic gradient Markov chain Monte Carlo

Author: Fearnhead Paul
Nemeth Christopher
Publication venue: 'Informa UK Limited'
Publication date: 30/03/2021
Field of study

Markov chain Monte Carlo (MCMC) algorithms are generally regarded as the gold standard technique for Bayesian inference. They are theoretically well-understood and conceptually simple to apply in practice. The drawback of MCMC is that performing exact inference generally requires all of the data to be processed at each iteration of the algorithm. For large data sets, the computational cost of MCMC can be prohibitive, which has led to recent developments in scalable Monte Carlo algorithms that have a significantly lower computational cost than standard MCMC. In this paper, we focus on a particular class of scalable Monte Carlo algorithms, stochastic gradient Markov chain Monte Carlo (SGMCMC) which utilises data subsampling techniques to reduce the per iteration cost of MCMC. We provide an introduction to some popular SGMCMC algorithms and review the supporting theoretical results, as well as comparing the efficiency of SGMCMC algorithms against MCMC on benchmark examples. The supporting R code is available online at https://github.com/chris-nemeth/sgmcmc-review-paper

Lancaster E-Prints