Search CORE

56 research outputs found

Coin Sampling: Gradient-Based Bayesian Inference without Learning Rates

Author: Nemeth Christopher
Sharrock Louis
Publication venue
Publication date: 01/06/2023
Field of study

In recent years, particle-based variational inference (ParVI) methods such as Stein variational gradient descent (SVGD) have grown in popularity as scalable methods for Bayesian inference. Unfortunately, the properties of such methods invariably depend on hyperparameters such as the learning rate, which must be carefully tuned by the practitioner in order to ensure convergence to the target measure at a suitable rate. In this paper, we introduce a suite of new particle-based methods for scalable Bayesian inference based on coin betting, which are entirely learning-rate free. We illustrate the performance of our approach on a range of numerical examples, including several high-dimensional models and datasets, demonstrating comparable performance to other ParVI algorithms with no need to tune a learning rate.Comment: ICML 202

arXiv.org e-Print Archive

Best-Arm Identification for Quantile Bandits with Privacy

Author: Kalogerias Dionysios S.
Nikolakakis Kontantinos E.
Sarwate Anand D.
Sheffet Or
Publication venue
Publication date: 11/06/2020
Field of study

We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is

\delta

-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity as well. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.Comment: 24 pages, 4 figure

arXiv.org e-Print Archive

Towards Understanding the Condensation of Neural Networks at Initial Training

Author: Luo Tao
Xu Zhi-Qin John
Zhang Yaoyu
Zhou Hanxu
Publication venue
Publication date: 17/11/2021
Field of study

Implicit regularization is important for understanding the learning of neural networks (NNs). Empirical works show that input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense on isolated orientations with a small initialization. The condensation dynamics implies that the training implicitly regularizes a NN towards one with much smaller effective size. In this work, we utilize multilayer networks to show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where "multiplicity" is multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization implicitly leads NNs to condensation at initial training stage, which lays a foundation for the future study of the nonlinear dynamics of NNs and its implicit regularization effect at a later stage of training

arXiv.org e-Print Archive

Online Learning and Bandits with Queried Hints

Author: Bhaskara Aditya
Gollapudi Sreenivas
Im Sungjin
Kollias Kostas
Munagala Kamesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Online Learning and Bandits with Queried Hints

Author: Bhaskara Aditya
Gollapudi Sreenivas
Im Sungjin
Kollias Kostas
Munagala Kamesh
Publication venue
Publication date: 04/11/2022
Field of study

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number (

k

) of choices has better reward (or loss) before making its choice. In this model, we derive algorithms whose regret bounds have exponentially better dependence on the time horizon compared to the classic regret bounds. In particular, we show that probing with

k=2

suffices to achieve time-independent regret bounds for online linear and convex optimization. The same number of probes improve the regret bound of stochastic MAB with independent arms from

O(\sqrt{nT})

O(n^2 \log T)

, where

n

is the number of arms and

T

is the horizon length. For stochastic MAB, we also consider a stronger model where a probe reveals the reward values of the probed arms, and show that in this case,

k=3

probes suffice to achieve parameter-independent constant regret,

O(n^2)

. Such regret bounds cannot be achieved even with full feedback after the play, showcasing the power of limited ``advice'' via probing before making the play. We also present extensions to the setting where the hints can be imperfect, and to the case of stochastic MAB where the rewards of the arms can be correlated.Comment: To appear in ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Differentially private sampling from distributions

Author: Raskhodnikova Sofya
Sivakumar Satchit
Smith Adam
Swanberg Marika
Publication venue
Publication date: 06/12/2021
Field of study

We initiate an investigation of private sampling from distributions. Given a dataset with n independent observations from an unknown distribution P, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to P while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on {1,…,k}, arbitrary product distributions on {0,1}d, and product distributions on on {0,1}d with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of P nonprivately; in other regimes, however, private sampling proves to be as difficult as private learning. Notably, for some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.https://proceedings.neurips.cc/paper/2021/hash/f2b5e92f61b6de923b063588ee6e7c48-Abstract.htm

Boston University Institutional Repository (OpenBU)

Structured Semidefinite Programming for Recovering Structured Preconditioners

Author: Jambulapati Arun
Li Jerry
Musco Christopher
Shiragur Kirankumar
Sidford Aaron
Tian Kevin
Publication venue
Publication date: 27/10/2023
Field of study

We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite

\mathbf{K} \in \mathbb{R}^{d \times d}

with

\mathrm{nnz}(\mathbf{K})

nonzero entries, computes an

\epsilon

-optimal diagonal preconditioner in time

\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1}))

, where

\kappa^\star

is the optimal condition number of the rescaled matrix. We give an algorithm which, given

\mathbf{M} \in \mathbb{R}^{d \times d}

that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in

\mathbf{M}

\widetilde{O}(d^2)

time. Our diagonal preconditioning results improve state-of-the-art runtimes of

\Omega(d^{3.5})

attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of

\Omega(d^{\omega})

where

\omega > 2.3

is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172

arXiv.org e-Print Archive