4,070 research outputs found
Learning Contextual Bandits in a Non-stationary Environment
Multi-armed bandit algorithms have become a reference solution for handling
the explore/exploit dilemma in recommender systems, and many other important
real-world problems, such as display advertisement. However, such algorithms
usually assume a stationary reward distribution, which hardly holds in practice
as users' preferences are dynamic. This inevitably costs a recommender system
consistent suboptimal performance. In this paper, we consider the situation
where the underlying distribution of reward remains unchanged over (possibly
short) epochs and shifts at unknown time instants. In accordance, we propose a
contextual bandit algorithm that detects possible changes of environment based
on its reward estimation confidence and updates its arm selection strategy
respectively. Rigorous upper regret bound analysis of the proposed algorithm
demonstrates its learning effectiveness in such a non-trivial environment.
Extensive empirical evaluations on both synthetic and real-world datasets for
recommendation confirm its practical utility in a changing environment.Comment: 10 pages, 13 figures, To appear on ACM Special Interest Group on
Information Retrieval (SIGIR) 201
Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
A contextual bandit problem is studied in a highly non-stationary
environment, which is ubiquitous in various recommender systems due to the
time-varying interests of users. Two models with disjoint and hybrid payoffs
are considered to characterize the phenomenon that users' preferences towards
different items vary differently over time. In the disjoint payoff model, the
reward of playing an arm is determined by an arm-specific preference vector,
which is piecewise-stationary with asynchronous and distinct changes across
different arms. An efficient learning algorithm that is adaptive to abrupt
reward changes is proposed and theoretical regret analysis is provided to show
that a sublinear scaling of regret in the time length is achieved. The
algorithm is further extended to a more general setting with hybrid payoffs
where the reward of playing an arm is determined by both an arm-specific
preference vector and a joint coefficient vector shared by all arms. Empirical
experiments are conducted on real-world datasets to verify the advantages of
the proposed learning algorithms against baseline ones in both settings.Comment: Accepted by AAAI 2
Maximizing Success Rate of Payment Routing using Non-stationary Bandits
This paper discusses the system architecture design and deployment of
non-stationary multi-armed bandit approaches to determine a near-optimal
payment routing policy based on the recent history of transactions. We propose
a Routing Service architecture using a novel Ray-based implementation for
optimally scaling bandit-based payment routing to over 10000 transactions per
second, adhering to the system design requirements and ecosystem constraints
with Payment Card Industry Data Security Standard (PCI DSS). We first evaluate
the effectiveness of multiple bandit-based payment routing algorithms on a
custom simulator to benchmark multiple non-stationary bandit approaches and
identify the best hyperparameters. We then conducted live experiments on the
payment transaction system on a fantasy sports platform Dream11. In the live
experiments, we demonstrated that our non-stationary bandit-based algorithm
consistently improves the success rate of transactions by 0.92\% compared to
the traditional rule-based methods over one month.Comment: 7 Pages, 6 Figure
On Limited-Memory Subsampling Strategies for Bandits
There has been a recent surge of interest in nonparametric bandit algorithms
based on subsampling. One drawback however of these approaches is the
additional complexity required by random subsampling and the storage of the
full history of rewards. Our first contribution is to show that a simple
deterministic subsampling rule, proposed in the recent work of Baudry et al.
(2020) under the name of ''last-block subsampling'', is asymptotically optimal
in one-parameter exponential families. In addition, we prove that these
guarantees also hold when limiting the algorithm memory to a polylogarithmic
function of the time horizon. These findings open up new perspectives, in
particular for non-stationary scenarios in which the arm distributions evolve
over time. We propose a variant of the algorithm in which only the most recent
observations are used for subsampling, achieving optimal regret guarantees
under the assumption of a known number of abrupt changes. Extensive numerical
simulations highlight the merits of this approach, particularly when the
changes are not only affecting the means of the rewards
Simulations to benchmark time-varying connectivity methods for fMRI
Published: May 29, 2018There is a current interest in quantifying time-varying connectivity (TVC) based on neuroimaging data such as fMRI. Many methods have been proposed, and are being applied, revealing new insight into the brain’s dynamics. However, given that the ground truth for TVC in the brain is unknown, many concerns remain regarding the accuracy of proposed estimates. Since there exist many TVC methods it is difficult to assess differences in time-varying connectivity between studies. In this paper, we present tvc_benchmarker, which is a Python package containing four simulations to test TVC methods. Here, we evaluate five different methods that together represent a wide spectrum of current approaches to estimating TVC (sliding window, tapered sliding window, multiplication of temporal derivatives, spatial distance and jackknife correlation). These simulations were designed to test each method’s ability to track changes in covariance over time, which is a key property in TVC analysis. We found that all tested methods correlated positively with each other, but there were large differences in the strength of the correlations between methods. To facilitate comparisons with future TVC methods, we propose that the described simulations can act as benchmark tests for evaluation of methods. Using tvc_benchmarker researchers can easily add, compare and submit their own TVC methods to evaluate its performance.WHT acknowledges support from the
Knut och Alice Wallenbergs Stiftelse (SE) (grant no.
2016.0473, http://kaw.wallenberg.org). PR
acknowledges support from the Swedish Research
Council (VetenskapsrĂĄdet) (grants no. 2016-03352
and 773 013-61X-08276-26-4) (http://vr.se) and
the Swedish e-Science Research Center (http://e-
science.se/). CGR acknowledges financial support
from the Spanish Ministry of Economy and
Competitiveness, through the ÂŞSevero OchoaÂş Programme for Centres/Units of Excellence in
R&DÂş (SEV-2015-490, http://csic.es/)
- …