56 research outputs found
Coin Sampling: Gradient-Based Bayesian Inference without Learning Rates
In recent years, particle-based variational inference (ParVI) methods such as
Stein variational gradient descent (SVGD) have grown in popularity as scalable
methods for Bayesian inference. Unfortunately, the properties of such methods
invariably depend on hyperparameters such as the learning rate, which must be
carefully tuned by the practitioner in order to ensure convergence to the
target measure at a suitable rate. In this paper, we introduce a suite of new
particle-based methods for scalable Bayesian inference based on coin betting,
which are entirely learning-rate free. We illustrate the performance of our
approach on a range of numerical examples, including several high-dimensional
models and datasets, demonstrating comparable performance to other ParVI
algorithms with no need to tune a learning rate.Comment: ICML 202
Best-Arm Identification for Quantile Bandits with Privacy
We study the best-arm identification problem in multi-armed bandits with
stochastic, potentially private rewards, when the goal is to identify the arm
with the highest quantile at a fixed, prescribed level. First, we propose a
(non-private) successive elimination algorithm for strictly optimal best-arm
identification, we show that our algorithm is -PAC and we characterize
its sample complexity. Further, we provide a lower bound on the expected number
of pulls, showing that the proposed algorithm is essentially optimal up to
logarithmic factors. Both upper and lower complexity bounds depend on a special
definition of the associated suboptimality gap, designed in particular for the
quantile bandit problem, as we show when the gap approaches zero, best-arm
identification is impossible. Second, motivated by applications where the
rewards are private, we provide a differentially private successive elimination
algorithm whose sample complexity is finite even for distributions with
infinite support-size, and we characterize its sample complexity as well. Our
algorithms do not require prior knowledge of either the suboptimality gap or
other statistical information related to the bandit problem at hand.Comment: 24 pages, 4 figure
Towards Understanding the Condensation of Neural Networks at Initial Training
Implicit regularization is important for understanding the learning of neural
networks (NNs). Empirical works show that input weights of hidden neurons (the
input weight of a hidden neuron consists of the weight from its input layer to
the hidden neuron and its bias term) condense on isolated orientations with a
small initialization. The condensation dynamics implies that the training
implicitly regularizes a NN towards one with much smaller effective size. In
this work, we utilize multilayer networks to show that the maximal number of
condensed orientations in the initial training stage is twice the multiplicity
of the activation function, where "multiplicity" is multiple roots of
activation function at origin. Our theoretical analysis confirms experiments
for two cases, one is for the activation function of multiplicity one with
arbitrary dimension input, which contains many common activation functions, and
the other is for the layer with one-dimensional input and arbitrary
multiplicity. This work makes a step towards understanding how small
initialization implicitly leads NNs to condensation at initial training stage,
which lays a foundation for the future study of the nonlinear dynamics of NNs
and its implicit regularization effect at a later stage of training
Online Learning and Bandits with Queried Hints
We consider the classic online learning and stochastic multi-armed bandit
(MAB) problems, when at each step, the online policy can probe and find out
which of a small number () of choices has better reward (or loss) before
making its choice. In this model, we derive algorithms whose regret bounds have
exponentially better dependence on the time horizon compared to the classic
regret bounds. In particular, we show that probing with suffices to
achieve time-independent regret bounds for online linear and convex
optimization. The same number of probes improve the regret bound of stochastic
MAB with independent arms from to , where is
the number of arms and is the horizon length. For stochastic MAB, we also
consider a stronger model where a probe reveals the reward values of the probed
arms, and show that in this case, probes suffice to achieve
parameter-independent constant regret, . Such regret bounds cannot be
achieved even with full feedback after the play, showcasing the power of
limited ``advice'' via probing before making the play. We also present
extensions to the setting where the hints can be imperfect, and to the case of
stochastic MAB where the rewards of the arms can be correlated.Comment: To appear in ITCS 202
Differentially private sampling from distributions
We initiate an investigation of private sampling from distributions. Given a dataset with n independent observations from an unknown distribution P, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to P while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on {1,…,k}, arbitrary product distributions on {0,1}d, and product distributions on on {0,1}d with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of P nonprivately; in other regimes, however, private sampling proves to be as difficult as private learning. Notably, for some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.https://proceedings.neurips.cc/paper/2021/hash/f2b5e92f61b6de923b063588ee6e7c48-Abstract.htm
Structured Semidefinite Programming for Recovering Structured Preconditioners
We develop a general framework for finding approximately-optimal
preconditioners for solving linear systems. Leveraging this framework we obtain
improved runtimes for fundamental preconditioning and linear system solving
problems including the following. We give an algorithm which, given positive
definite with
nonzero entries, computes an -optimal
diagonal preconditioner in time , where is the
optimal condition number of the rescaled matrix. We give an algorithm which,
given that is either the pseudoinverse
of a graph Laplacian matrix or a constant spectral approximation of one, solves
linear systems in in time. Our diagonal
preconditioning results improve state-of-the-art runtimes of
attained by general-purpose semidefinite programming, and our solvers improve
state-of-the-art runtimes of where is the
current matrix multiplication constant. We attain our results via new
algorithms for a class of semidefinite programs (SDPs) we call
matrix-dictionary approximation SDPs, which we leverage to solve an associated
problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172
- …