15 research outputs found

    Prior-free and prior-dependent regret bounds for Thompson Sampling

    Full text link
    We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and prior-dependent regret bounds, very much in the same spirit as the usual distribution-free and distribution-dependent bounds for the non-Bayesian stochastic bandit. Building on the techniques of Audibert and Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by 14nK14 \sqrt{n K}. This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by 120nK\frac{1}{20} \sqrt{n K}. We also study the case of priors for the setting of Bubeck et al. [2013] (where the optimal mean is known as well as a lower bound on the smallest gap) and we show that in this case the regret of Thompson Sampling is in fact uniformly bounded over time, thus showing that Thompson Sampling can greatly take advantage of the nice properties of these priors.Comment: A previous version appeared under the title 'A note on the Bayesian regret of Thompson Sampling with an arbitrary prior

    An Information-Theoretic Analysis of Thompson Sampling

    Full text link
    We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance

    Bounded Regret for Finite-Armed Structured Bandits

    Full text link
    We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.Comment: 16 page

    On the Suboptimality of Thompson Sampling in High Dimensions

    Full text link
    In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions, with Bernoulli distributed rewards and uniform priors. We also show that including a fixed amount of forced exploration to TS does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice TS indeed can perform very poorly in some high dimensional situations.Comment: Neurips 2021 - 34 page

    Efficient approximate thompson sampling for search query recommendation

    Full text link
    Query suggestions have been a valuable feature for e-commerce sites in helping shoppers refine their search intent. In this paper, we develop an algorithm that helps e-commerce sites like eBay mingle the output of different recommendation al-gorithms. Our algorithm is based on “Thompson Sampling” — a technique designed for solving multi-arm bandit prob-lems where the best results are not known in advance but instead are tried out to gather feedback. Our approach is to treat query suggestions as a competition among data re-sources: we have many query suggestion candidates compet-ing for limited space on the search results page. An “arm” is played when a query suggestion candidate is chosen for display, and our goal is to maximize the expected reward (user clicks on a suggestion). Our experiments have shown promising results in using the click-based user feedback to drive success by enhancing the quality of query suggestions

    Under-representation in America: Special Interest Groups, Referendums, and Election Reform

    Get PDF
    Americans are inadequately represented. Despite being such an important part of political science, social choice theory remains an area of study seldomly incorporated into political dialogue. Special interest groups and gerrymandering insidiously affect political substructures and can have long-lasting impacts. Referendums often produce paradoxical results and frequently fail to satisfy voters. They can also restrict minority rights when political participation is in question. Voting systems around the world have remained unchanged for over two centuries and poorly express voter desires. Improving upon elements encompassed by social choice theory has the potential to ensure more accurate representation. The issue of gerrymandering can be mitigated using new identification and districting methods. Additionally, policy makers should take note that referendums are most useful with single issue topics. Lastly, voting systems like Majority Judgement offer to revolutionize the way voting is accomplished in America. This thesis showcases numerous correlations demonstrating representation shortfalls in each of these areas and details improvements where aspects of these elements can be improved
    corecore