3,330 research outputs found

    Thompson Sampling for Bandits with Clustered Arms

    Get PDF
    We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms

    Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

    Full text link
    Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. \emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a \emph{replay} methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.Comment: 10 pages, 7 figures, revised from the published version at the WSDM 2011 conferenc

    Enhancing Evolutionary Conversion Rate Optimization via Multi-armed Bandit Algorithms

    Full text link
    Conversion rate optimization means designing web interfaces such that more visitors perform a desired action (such as register or purchase) on the site. One promising approach, implemented in Sentient Ascend, is to optimize the design using evolutionary algorithms, evaluating each candidate design online with actual visitors. Because such evaluations are costly and noisy, several challenges emerge: How can available visitor traffic be used most efficiently? How can good solutions be identified most reliably? How can a high conversion rate be maintained during optimization? This paper proposes a new technique to address these issues. Traffic is allocated to candidate solutions using a multi-armed bandit algorithm, using more traffic on those evaluations that are most useful. In a best-arm identification mode, the best candidate can be identified reliably at the end of evolution, and in a campaign mode, the overall conversion rate can be optimized throughout the entire evolution process. Multi-armed bandit algorithms thus improve performance and reliability of machine discovery in noisy real-world environments.Comment: The Thirty-First Innovative Applications of Artificial Intelligence Conferenc

    Context Attentive Bandits: Contextual Bandit with Restricted Context

    Full text link
    We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context(TSRC) and the Windows Thompson Sampling with Restricted Context(WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasetsComment: IJCAI 201
    • …
    corecore