6 research outputs found

    Maximizing Success Rate of Payment Routing using Non-stationary Bandits

    Full text link
    This paper discusses the system architecture design and deployment of non-stationary multi-armed bandit approaches to determine a near-optimal payment routing policy based on the recent history of transactions. We propose a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10000 transactions per second, adhering to the system design requirements and ecosystem constraints with Payment Card Industry Data Security Standard (PCI DSS). We first evaluate the effectiveness of multiple bandit-based payment routing algorithms on a custom simulator to benchmark multiple non-stationary bandit approaches and identify the best hyperparameters. We then conducted live experiments on the payment transaction system on a fantasy sports platform Dream11. In the live experiments, we demonstrated that our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92\% compared to the traditional rule-based methods over one month.Comment: 7 Pages, 6 Figure

    Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

    Get PDF
    International audienceWe introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a O(TA΄Tlog⁥(T))O(\sqrt{TA \Upsilon_T\log(T)}) regret in TT rounds on some ``easy'' instances, where A is the number of arms and ΄T\Upsilon_T the number of change-points, without prior knowledge of ΄T\Upsilon_T. In contrast with recently proposed algorithms that are agnostic to ΄T\Upsilon_T, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances

    Zeroth-order non-convex learning via hierarchical dual averaging

    Get PDF
    International audienceWe propose a hierarchical version of dual averaging for zeroth-order online non-convex optimization-i.e., learning processes where, at each stage, the optimizer is facing an unknown non-convex loss function and only receives the incurred loss as feedback. The proposed class of policies relies on the construction of an online model that aggregates loss information as it arrives, and it consists of two principal components: (a) a regularizer adapted to the Fisher information metric (as opposed to the metric norm of the ambient space); and (b) a principled exploration of the problem's state space based on an adapted hierarchical schedule. This construction enables sharper control of the model's bias and variance, and allows us to derive tight bounds for both the learner's static and dynamic regret-i.e., the regret incurred against the best dynamic policy in hindsight over the horizon of play
    corecore