21 research outputs found
Relational Boosted Bandits
Contextual bandits algorithms have become essential in real-world user
interaction problems in recent years. However, these algorithms rely on context
as attribute value representation, which makes them unfeasible for real-world
domains like social networks are inherently relational. We propose Relational
Boosted Bandits(RB2), acontextual bandits algorithm for relational domains
based on (relational) boosted trees. RB2 enables us to learn interpretable and
explainable models due to the more descriptive nature of the relational
representation. We empirically demonstrate the effectiveness and
interpretability of RB2 on tasks such as link prediction, relational
classification, and recommendations.Comment: 8 pages, 3 figure
Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning
We design and implement an adaptive experiment (a ``contextual bandit'') to
learn a targeted treatment assignment policy, where the goal is to use a
participant's survey responses to determine which charity to expose them to in
a donation solicitation. The design balances two competing objectives:
optimizing the outcomes for the subjects in the experiment (``cumulative regret
minimization'') and gathering data that will be most useful for policy
learning, that is, for learning an assignment rule that will maximize welfare
if used after the experiment (``simple regret minimization''). We evaluate
alternative experimental designs by collecting pilot data and then conducting a
simulation study. Next, we implement our selected algorithm. Finally, we
perform a second simulation study anchored to the collected data that evaluates
the benefits of the algorithm we chose. Our first result is that the value of a
learned policy in this setting is higher when data is collected via a uniform
randomization rather than collected adaptively using standard cumulative regret
minimization or policy learning algorithms. We propose a simple heuristic for
adaptive experimentation that improves upon uniform randomization from the
perspective of policy learning at the expense of increasing cumulative regret
relative to alternative bandit algorithms. The heuristic modifies an existing
contextual bandit algorithm by (i) imposing a lower bound on assignment
probabilities that decay slowly so that no arm is discarded too quickly, and
(ii) after adaptively collecting data, restricting policy learning to select
from arms where sufficient data has been gathered
When Are Linear Stochastic Bandits Attackable?
We study adversarial attacks on linear stochastic bandits: by manipulating
the rewards, an adversary aims to control the behaviour of the bandit
algorithm. Perhaps surprisingly, we first show that some attack goals can never
be achieved. This is in sharp contrast to context-free stochastic bandits, and
is intrinsically due to the correlation among arms in linear stochastic
bandits. Motivated by this finding, this paper studies the attackability of a
-armed linear bandit environment. We first provide a complete necessity and
sufficiency characterization of attackability based on the geometry of the
arms' context vectors. We then propose a two-stage attack method against LinUCB
and Robust Phase Elimination. The method first asserts whether the given
environment is attackable; and if yes, it poisons the rewards to force the
algorithm to pull a target arm linear times using only a sublinear cost.
Numerical experiments further validate the effectiveness and cost-efficiency of
the proposed attack method.Comment: 27 pages, 3 figures, ICML 202