6 research outputs found
Bayesian Meta-Prior Learning Using Empirical Bayes
Adding domain knowledge to a learning system is known to improve results. In
multi-parameter Bayesian frameworks, such knowledge is incorporated as a prior.
On the other hand, various model parameters can have different learning rates
in real-world problems, especially with skewed data. Two often-faced challenges
in Operation Management and Management Science applications are the absence of
informative priors, and the inability to control parameter learning rates. In
this study, we propose a hierarchical Empirical Bayes approach that addresses
both challenges, and that can generalize to any Bayesian framework. Our method
learns empirical meta-priors from the data itself and uses them to decouple the
learning rates of first-order and second-order features (or any other given
feature grouping) in a Generalized Linear Model. As the first-order features
are likely to have a more pronounced effect on the outcome, focusing on
learning first-order weights first is likely to improve performance and
convergence time. Our Empirical Bayes method clamps features in each group
together and uses the deployed model's observed data to empirically compute a
hierarchical prior in hindsight. We report theoretical results for the
unbiasedness, strong consistency, and optimal frequentist cumulative regret
properties of our meta-prior variance estimator. We apply our method to a
standard supervised learning optimization problem, as well as an online
combinatorial optimization problem in a contextual bandit setting implemented
in an Amazon production system. Both during simulations and live experiments,
our method shows marked improvements, especially in cases of small traffic. Our
findings are promising, as optimizing over sparse data is often a challenge.Comment: Expanded discussions on applications and extended literature review
section. Forthcoming in the Management Science Journa
Multiple Treatment Modeling for Target Marketing Campaigns: A Large-Scale Benchmark Study
Machine learning and artificial intelligence (ML/AI) promise higher degrees of personalization and enhanced efficiency in marketing communication. The paper focuses on causal ML/AI models for campaign targeting. Such models estimate the change in customer behavior due to a marketing action known as the individual treatment effect (ITE) or uplift. ITE estimates capture the value of a marketing action when applied to a specific customer and facilitate effective and efficient targeting. We consolidate uplift models for multiple treatments and continuous outcomes and perform a benchmarking study to demonstrate their potential to target promotional monetary campaigns. In this use case, the new models facilitate selecting the optimal discount amount to offer to a customer. Large-scale analysis based on eight marketing data sets from leading B2C retailers confirms the significant gains in the campaign return on marketing when using the new models compared to relevant model benchmarks and conventional marketing practices.Peer Reviewe
Online Causal Inference for Advertising in Real-Time Bidding Auctions
Real-time bidding (RTB) systems, which leverage auctions to programmatically
allocate user impressions to multiple competing advertisers, continue to enjoy
widespread success in digital advertising. Assessing the effectiveness of such
advertising remains a lingering challenge in research and practice. This paper
presents a new experimental design to perform causal inference on advertising
bought through such mechanisms. Our method leverages the economic structure of
first- and second-price auctions, which are ubiquitous in RTB systems, embedded
within a multi-armed bandit (MAB) setup for online adaptive experimentation. We
implement it via a modified Thompson sampling (TS) algorithm that estimates
causal effects of advertising while minimizing the costs of experimentation to
the advertiser by simultaneously learning the optimal bidding policy that
maximizes her expected payoffs from auction participation. Simulations show
that not only the proposed method successfully accomplishes the advertiser's
goals, but also does so at a much lower cost than more conventional
experimentation policies aimed at performing causal inference
Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits
We propose improved fixed-design confidence bounds for the linear logistic
model. Our bounds significantly improve upon the state-of-the-art bound by Li
et al. (2017) via recent developments of the self-concordant analysis of the
logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a
direct dependence on , where is the minimal variance over
all arms' reward distributions. In general, scales exponentially
with the norm of the unknown linear parameter . Instead of relying on
this worst-case quantity, our confidence bound for the reward of any given arm
depends directly on the variance of that arm's reward distribution. We present
two applications of our novel bounds to pure exploration and regret
minimization logistic bandits improving upon state-of-the-art performance
guarantees. For pure exploration, we also provide a lower bound highlighting a
dependence on for a family of instances
Homomorphically Encrypted Linear Contextual Bandit
Contextual bandit is a general framework for online learning in sequential
decision-making problems that has found application in a large range of
domains, including recommendation system, online advertising, clinical trials
and many more. A critical aspect of bandit methods is that they require to
observe the contexts -- i.e., individual or group-level data -- and the rewards
in order to solve the sequential problem. The large deployment in industrial
applications has increased interest in methods that preserve the privacy of the
users. In this paper, we introduce a privacy-preserving bandit framework based
on asymmetric encryption. The bandit algorithm only observes encrypted
information (contexts and rewards) and has no ability to decrypt it. Leveraging
homomorphic encryption, we show that despite the complexity of the setting, it
is possible to learn over encrypted data. We introduce an algorithm that
achieves a regret bound in any linear contextual
bandit problem, while keeping data encrypted