Search CORE

8 research outputs found

Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation

Author: Foster Adam
Ivanova Desi R.
Jennings Joel
Zhang Cheng
Publication venue
Publication date: 11/07/2022
Field of study

The real-world testing of decisions made using causal machine learning models is an essential prerequisite for their successful application. We focus on evaluating and improving contextual treatment assignment decisions: these are personalised treatments applied to e.g. customers, each with their own contextual information, with the aim of maximising a reward. In this paper we introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making through Bayesian Experimental Design. Specifically, our method is used for the data-efficient evaluation of the regret of past treatment assignments. Unlike approaches such as A/B testing, our method avoids assigning treatments that are known to be highly sub-optimal, whilst engaging in some exploration to gather pertinent information. We achieve this by introducing an information-based design objective, which we optimise end-to-end. Our method applies to discrete and continuous treatments. Comparing our information-theoretic approach to baselines in several simulation studies demonstrates the superior performance of our proposed approach.Comment: ICML 2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World. 16 pages, 5 figure

arXiv.org e-Print Archive

Robust Bandit Learning with Imperfect Context

Author: Ren Shaolei
Yang Jianyi
Publication venue
Publication date: 04/04/2021
Field of study

A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

Author: Ren Zhimei
Zhou Zhengyuan
Publication venue
Publication date: 27/08/2020
Field of study

We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker, under a given maximum-number-of-batch constraint and only able to observe rewards at the end of each batch, can dynamically decide how many individuals to include in the next batch (at the end of the current batch) and what personalized action-selection scheme to adopt within each batch. Such batch constraints are ubiquitous in a variety of practical contexts, including personalized product offerings in marketing and medical treatment selection in clinical trials. We characterize the fundamental learning limit in this problem via a regret lower bound and provide a matching upper bound (up to log factors), thus prescribing an optimal scheme for this problem. To the best of our knowledge, our work provides the first inroad into a theoretical understanding of dynamic batch learning in high-dimensional sparse linear contextual bandits. Notably, even a special case of our result (when no batch constraint is present) yields the first minimax optimal

\tilde{O}(\sqrt{s_0T})

regret bound for standard online learning in high-dimensional linear contextual bandits (for the no-margin case), where

s_0

is the sparsity parameter (or an upper bound thereof) and

T

is the learning horizon. This result (both that

\tilde{O}(\sqrt{s_0 T})

is achievable and that

\Omega(\sqrt{s_0 T})

is a lower bound) appears to be unknown in the emerging literature of high-dimensional contextual bandits.Comment: 33 page

arXiv.org e-Print Archive

Online Learning of Energy Consumption for Navigation of Electric Vehicles

Author: \uc5kerblom Niklas
Chen Yuxin
Haghir Chehreghani Morteza
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Energy efficient navigation constitutes an important challenge in electric vehicles, due to their limited battery capacity. We employ a Bayesian approach to model the energy consumption at road segments for efficient navigation. In order to learn the model parameters, we develop an online learning framework and investigate several exploration strategies such as Thompson Sampling and Upper Confidence Bound. We then extend our online learning framework to the multi-agent setting, where multiple vehicles adaptively navigate and learn the parameters of the energy model. We analyze Thompson Sampling and establish rigorous regret bounds on its performance in the single-agent and multi-agent settings, through an analysis of the algorithm under batched feedback. Finally, we demonstrate the performance of our methods via experiments on several real-world city road networks

Chalmers Research

Homomorphically Encrypted Linear Contextual Bandit

Author: Garcelon Evrard
Perchet Vianney
Pirotta Matteo
Publication venue
Publication date: 17/03/2021
Field of study

Contextual bandit is a general framework for online learning in sequential decision-making problems that has found application in a large range of domains, including recommendation system, online advertising, clinical trials and many more. A critical aspect of bandit methods is that they require to observe the contexts -- i.e., individual or group-level data -- and the rewards in order to solve the sequential problem. The large deployment in industrial applications has increased interest in methods that preserve the privacy of the users. In this paper, we introduce a privacy-preserving bandit framework based on asymmetric encryption. The bandit algorithm only observes encrypted information (contexts and rewards) and has no ability to decrypt it. Leveraging homomorphic encryption, we show that despite the complexity of the setting, it is possible to learn over encrypted data. We introduce an algorithm that achieves a

\widetilde{O}(d\sqrt{T})

regret bound in any linear contextual bandit problem, while keeping data encrypted

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique