15 research outputs found
Prior-free and prior-dependent regret bounds for Thompson Sampling
We consider the stochastic multi-armed bandit problem with a prior
distribution on the reward distributions. We are interested in studying
prior-free and prior-dependent regret bounds, very much in the same spirit as
the usual distribution-free and distribution-dependent bounds for the
non-Bayesian stochastic bandit. Building on the techniques of Audibert and
Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling
attains an optimal prior-free bound in the sense that for any prior
distribution its Bayesian regret is bounded from above by . This
result is unimprovable in the sense that there exists a prior distribution such
that any algorithm has a Bayesian regret bounded from below by . We also study the case of priors for the setting of Bubeck et al.
[2013] (where the optimal mean is known as well as a lower bound on the
smallest gap) and we show that in this case the regret of Thompson Sampling is
in fact uniformly bounded over time, thus showing that Thompson Sampling can
greatly take advantage of the nice properties of these priors.Comment: A previous version appeared under the title 'A note on the Bayesian
regret of Thompson Sampling with an arbitrary prior
An Information-Theoretic Analysis of Thompson Sampling
We provide an information-theoretic analysis of Thompson sampling that
applies across a broad range of online optimization problems in which a
decision-maker must learn from partial feedback. This analysis inherits the
simplicity and elegance of information theory and leads to regret bounds that
scale with the entropy of the optimal-action distribution. This strengthens
preexisting results and yields new insight into how information improves
performance
Bounded Regret for Finite-Armed Structured Bandits
We study a new type of K-armed bandit problem where the expected return of
one arm may depend on the returns of other arms. We present a new algorithm for
this general class of problems and show that under certain circumstances it is
possible to achieve finite expected cumulative regret. We also give
problem-dependent lower bounds on the cumulative regret showing that at least
in special cases the new algorithm is nearly optimal.Comment: 16 page
On the Suboptimality of Thompson Sampling in High Dimensions
In this paper we consider Thompson Sampling (TS) for combinatorial
semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for
this problem in the sense that its regret scales exponentially in the ambient
dimension, and its minimax regret scales almost linearly. This phenomenon
occurs under a wide variety of assumptions including both non-linear and linear
reward functions, with Bernoulli distributed rewards and uniform priors. We
also show that including a fixed amount of forced exploration to TS does not
alleviate the problem. We complement our theoretical results with numerical
results and show that in practice TS indeed can perform very poorly in some
high dimensional situations.Comment: Neurips 2021 - 34 page
Efficient approximate thompson sampling for search query recommendation
Query suggestions have been a valuable feature for e-commerce sites in helping shoppers refine their search intent. In this paper, we develop an algorithm that helps e-commerce sites like eBay mingle the output of different recommendation al-gorithms. Our algorithm is based on “Thompson Sampling” — a technique designed for solving multi-arm bandit prob-lems where the best results are not known in advance but instead are tried out to gather feedback. Our approach is to treat query suggestions as a competition among data re-sources: we have many query suggestion candidates compet-ing for limited space on the search results page. An “arm” is played when a query suggestion candidate is chosen for display, and our goal is to maximize the expected reward (user clicks on a suggestion). Our experiments have shown promising results in using the click-based user feedback to drive success by enhancing the quality of query suggestions
Under-representation in America: Special Interest Groups, Referendums, and Election Reform
Americans are inadequately represented. Despite being such an important part of political science, social choice theory remains an area of study seldomly incorporated into political dialogue. Special interest groups and gerrymandering insidiously affect political substructures and can have long-lasting impacts. Referendums often produce paradoxical results and frequently fail to satisfy voters. They can also restrict minority rights when political participation is in question. Voting systems around the world have remained unchanged for over two centuries and poorly express voter desires. Improving upon elements encompassed by social choice theory has the potential to ensure more accurate representation. The issue of gerrymandering can be mitigated using new identification and districting methods. Additionally, policy makers should take note that referendums are most useful with single issue topics. Lastly, voting systems like Majority Judgement offer to revolutionize the way voting is accomplished in America. This thesis showcases numerous correlations demonstrating representation shortfalls in each of these areas and details improvements where aspects of these elements can be improved