Search CORE

6 research outputs found

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

Author: Chaudhary Aayush
Gupta Abhishek
Rai Abhinav
Publication venue
Publication date: 02/08/2023
Field of study

This paper discusses the system architecture design and deployment of non-stationary multi-armed bandit approaches to determine a near-optimal payment routing policy based on the recent history of transactions. We propose a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10000 transactions per second, adhering to the system design requirements and ecosystem constraints with Payment Card Industry Data Security Standard (PCI DSS). We first evaluate the effectiveness of multiple bandit-based payment routing algorithms on a custom simulator to benchmark multiple non-stationary bandit approaches and identify the best hyperparameters. We then conducted live experiments on the payment transaction system on a fantasy sports platform Dream11. In the live experiments, we demonstrated that our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92\% compared to the traditional rule-based methods over one month.Comment: 7 Pages, 6 Figure

arXiv.org e-Print Archive

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Author: Besson Lilian
Kaufmann Emilie
Maillard Odalric-Ambrym
Seznec Julien
Publication venue: Microtome Publishing
Publication date: 01/03/2022
Field of study

International audienceWe introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a

O(\sqrt{TA \Upsilon_T\log(T)})

regret in

T

rounds on some ``easy'' instances, where A is the number of arms and

\Upsilon_T

the number of change-points, without prior knowledge of

\Upsilon_T

. In contrast with recently proposed algorithms that are agnostic to

\Upsilon_T

, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances

INRIA a CCSD electronic archive server

Zeroth-order non-convex learning via hierarchical dual averaging

Author: Héliou Amélie
Martin Matthieu
Mertikopoulos Panayotis
Rahier Thibaud
Publication venue: HAL CCSD
Publication date: 18/07/2021
Field of study

International audienceWe propose a hierarchical version of dual averaging for zeroth-order online non-convex optimization-i.e., learning processes where, at each stage, the optimizer is facing an unknown non-convex loss function and only receives the incurred loss as feedback. The proposed class of policies relies on the construction of an online model that aggregates loss information as it arrives, and it consists of two principal components: (a) a regularizer adapted to the Fisher information metric (as opposed to the metric norm of the ambient space); and (b) a principled exploration of the problem's state space based on an adapted hierarchical schedule. This construction enables sharper control of the model's bias and variance, and allows us to derive tight bounds for both the learner's static and dynamic regret-i.e., the regret incurred against the best dynamic policy in hindsight over the horizon of play

INRIA a CCSD electronic archive server