1 research outputs found
Corrupted Contextual Bandits with Action Order Constraints
We consider a variant of the novel contextual bandit problem with corrupted
context, which we call the contextual bandit problem with corrupted context and
action correlation, where actions exhibit a relationship structure that can be
exploited to guide the exploration of viable next decisions. Our setting is
primarily motivated by adaptive mobile health interventions and related
applications, where users might transitions through different stages requiring
more targeted action selection approaches. In such settings, keeping user
engagement is paramount for the success of interventions and therefore it is
vital to provide relevant recommendations in a timely manner. The context
provided by users might not always be informative at every decision point and
standard contextual approaches to action selection will incur high regret. We
propose a meta-algorithm using a referee that dynamically combines the policies
of a contextual bandit and multi-armed bandit, similar to previous work, as
wells as a simple correlation mechanism that captures action to action
transition probabilities allowing for more efficient exploration of
time-correlated actions. We evaluate empirically the performance of said
algorithm on a simulation where the sequence of best actions is determined by a
hidden state that evolves in a Markovian manner. We show that the proposed
meta-algorithm improves upon regret in situations where the performance of both
policies varies such that one is strictly superior to the other for a given
time period. To demonstrate that our setting has relevant practical
applicability, we evaluate our method on several real world data sets, clearly
showing better empirical performance compared to a set of simple algorithms