Search CORE

504 research outputs found

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Author: Cao Junyu
Sun Wei
Publication venue
Publication date: 19/03/2019
Field of study

Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the platform due to marketing fatigue. We propose a novel sequential choice model to capture multiple interactions taking place between the platform and its user: Upon receiving a message, a user decides on one of the three actions: accept the message, skip and receive the next message, or abandon the platform. Based on user feedback, the platform dynamically learns users' abandonment distribution and their valuations of messages to determine the length of the sequence and the order of the messages, while maximizing the cumulative payoff over a horizon of length T. We refer to this online learning task as the sequential choice bandit problem. For the offline combinatorial optimization problem, we show that an efficient polynomial-time algorithm exists. For the online problem, we propose an algorithm that balances exploration and exploitation, and characterize its regret bound. Lastly, we demonstrate how to extend the model with user contexts to incorporate personalization

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Stochastic Linear Bandits Robust to Adversarial Attacks

Author: Bogunovic Ilija
Krause Andreas
Losalka Arpan
Scarlett Jonathan
Publication venue
Publication date: 07/07/2020
Field of study

We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget

C

(i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows

C

and one that does not. Both variants are shown to attain near-optimal regret in the non-corrupted case

C = 0

, while incurring additional additive terms respectively having a linear and quadratic dependency on

C

in general. We present algorithm independent lower bounds showing that these additive terms are near-optimal. In addition, in a contextual setting, we revisit a setup of diverse contexts, and show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing

C

arXiv.org e-Print Archive

Repository for Publications and Research Data