Search CORE

21 research outputs found

Relational Boosted Bandits

Author: Kakadiya Ashutosh
Natarajan Sriraam
Ravindran Balaraman
Publication venue
Publication date: 16/12/2020
Field of study

Contextual bandits algorithms have become essential in real-world user interaction problems in recent years. However, these algorithms rely on context as attribute value representation, which makes them unfeasible for real-world domains like social networks are inherently relational. We propose Relational Boosted Bandits(RB2), acontextual bandits algorithm for relational domains based on (relational) boosted trees. RB2 enables us to learn interpretable and explainable models due to the more descriptive nature of the relational representation. We empirically demonstrate the effectiveness and interpretability of RB2 on tasks such as link prediction, relational classification, and recommendations.Comment: 8 pages, 3 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

Author: Athey Susan
Byambadalai Undral
Hadad Vitor
Krishnamurthy Sanath Kumar
Leung Weiwen
Williams Joseph Jay
Publication venue
Publication date: 21/11/2022
Field of study

We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. The design balances two competing objectives: optimizing the outcomes for the subjects in the experiment (``cumulative regret minimization'') and gathering data that will be most useful for policy learning, that is, for learning an assignment rule that will maximize welfare if used after the experiment (``simple regret minimization''). We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study. Next, we implement our selected algorithm. Finally, we perform a second simulation study anchored to the collected data that evaluates the benefits of the algorithm we chose. Our first result is that the value of a learned policy in this setting is higher when data is collected via a uniform randomization rather than collected adaptively using standard cumulative regret minimization or policy learning algorithms. We propose a simple heuristic for adaptive experimentation that improves upon uniform randomization from the perspective of policy learning at the expense of increasing cumulative regret relative to alternative bandit algorithms. The heuristic modifies an existing contextual bandit algorithm by (i) imposing a lower bound on assignment probabilities that decay slowly so that no arm is discarded too quickly, and (ii) after adaptively collecting data, restricting policy learning to select from arms where sufficient data has been gathered

arXiv.org e-Print Archive

When Are Linear Stochastic Bandits Attackable?

Author: Wang Hongning
Wang Huazheng
Xu Haifeng
Publication venue
Publication date: 02/07/2022
Field of study

We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this finding, this paper studies the attackability of a

k

-armed linear bandit environment. We first provide a complete necessity and sufficiency characterization of attackability based on the geometry of the arms' context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the given environment is attackable; and if yes, it poisons the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed attack method.Comment: 27 pages, 3 figures, ICML 202

arXiv.org e-Print Archive