5,508 research outputs found
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
The information-theoretic analysis by Russo and Van Roy (2014) in combination
with minimax duality has proved a powerful tool for the analysis of online
learning algorithms in full and partial information settings. In most
applications there is a tantalising similarity to the classical analysis based
on mirror descent. We make a formal connection, showing that the
information-theoretic bounds in most applications can be derived from existing
techniques for online convex optimisation. Besides this, for -armed
adversarial bandits we provide an efficient algorithm with regret that matches
the best information-theoretic upper bound and improve best known regret
guarantees for online linear optimisation on -balls and bandits with
graph feedback
First-Order Regret Analysis of Thompson Sampling
We address online combinatorial optimization when the player has a prior over
the adversary's sequence of losses. In this framework, Russo and Van Roy
proposed an information-theoretic analysis of Thompson Sampling based on the
{\em information ratio}, resulting in optimal worst-case regret bounds. In this
paper we introduce three novel ideas to this line of work. First we propose a
new quantity, the scale-sensitive information ratio, which allows us to obtain
more refined first-order regret bounds (i.e., bounds of the form
where is the loss of the best combinatorial action). Second we replace
the entropy over combinatorial actions by a coordinate entropy, which allows us
to obtain the first optimal worst-case bound for Thompson Sampling in the
combinatorial setting. Finally, we introduce a novel link between Bayesian
agents and frequentist confidence intervals. Combining these ideas we show that
the classical multi-armed bandit first-order regret bound still holds true in the more challenging and more general semi-bandit
scenario. This latter result improves the previous state of the art bound
by Lykouris, Sridharan and Tardos.Comment: 27 page
Personalized Federated Learning with Hidden Information on Personalized Prior
Federated learning (FL for simplification) is a distributed machine learning
technique that utilizes global servers and collaborative clients to achieve
privacy-preserving global model training without direct data sharing. However,
heterogeneous data problem, as one of FL's main problems, makes it difficult
for the global model to perform effectively on each client's local data. Thus,
personalized federated learning (PFL for simplification) aims to improve the
performance of the model on local data as much as possible. Bayesian learning,
where the parameters of the model are seen as random variables with a prior
assumption, is a feasible solution to the heterogeneous data problem due to the
tendency that the more local data the model use, the more it focuses on the
local data, otherwise focuses on the prior. When Bayesian learning is applied
to PFL, the global model provides global knowledge as a prior to the local
training process. In this paper, we employ Bayesian learning to model PFL by
assuming a prior in the scaled exponential family, and therefore propose
pFedBreD, a framework to solve the problem we model using Bregman divergence
regularization. Empirically, our experiments show that, under the prior
assumption of the spherical Gaussian and the first order strategy of mean
selection, our proposal significantly outcompetes other PFL algorithms on
multiple public benchmarks.Comment: 19 pages, 6 figures, 3 table
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential
decision problems with an exploration-exploitation trade-off. This is the
balance between staying with the option that gave highest payoffs in the past
and exploring new options that might give higher payoffs in the future.
Although the study of bandit problems dates back to the Thirties,
exploration-exploitation trade-offs arise in several modern applications, such
as ad placement, website optimization, and packet routing. Mathematically, a
multi-armed bandit is defined by the payoff process associated with each
option. In this survey, we focus on two extreme cases in which the analysis of
regret is particularly simple and elegant: i.i.d. payoffs and adversarial
payoffs. Besides the basic setting of finitely many actions, we also analyze
some of the most important variants and extensions, such as the contextual
bandit model.Comment: To appear in Foundations and Trends in Machine Learnin
- …