Search CORE

6,350 research outputs found

The multi-armed bandit, with constraints

Author: A. F. Veinott Jr.
D. A. Berry
D. Bergemann
D. Bertsimas
E. A. Feinberg
E. Altman
E. V. Denardo
E. V. Denardo
E. V. Denardo
Eric V. Denardo
Eugene A. Feinberg
G. Weiss
H. Kaspi
I. Sonin
J. C. Gittins
J. C. Gittins
J. C. Gittins
J. C. Gittins
J. Niño-Mora
J. Tsitsiklis
K. Schlag
M. N. Katehakis
M. N. Katehakis
N. El Karoui
P. Variaya
P. Whittle
R. Weber
Uriel G. Rothblum
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Incorporating Behavioral Constraints in Online AI Systems

Author: Balakrishnan Avinash
Bouneffouf Djallel
Mattei Nicholas
Rossi Francesca
Publication venue
Publication date: 15/09/2018
Field of study

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

Author: Chaudhary Aayush
Gupta Abhishek
Rai Abhinav
Publication venue
Publication date: 02/08/2023
Field of study

This paper discusses the system architecture design and deployment of non-stationary multi-armed bandit approaches to determine a near-optimal payment routing policy based on the recent history of transactions. We propose a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10000 transactions per second, adhering to the system design requirements and ecosystem constraints with Payment Card Industry Data Security Standard (PCI DSS). We first evaluate the effectiveness of multiple bandit-based payment routing algorithms on a custom simulator to benchmark multiple non-stationary bandit approaches and identify the best hyperparameters. We then conducted live experiments on the payment transaction system on a fantasy sports platform Dream11. In the live experiments, we demonstrated that our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92\% compared to the traditional rule-based methods over one month.Comment: 7 Pages, 6 Figure

arXiv.org e-Print Archive