6 research outputs found

    Bayesian Meta-Prior Learning Using Empirical Bayes

    Full text link
    Adding domain knowledge to a learning system is known to improve results. In multi-parameter Bayesian frameworks, such knowledge is incorporated as a prior. On the other hand, various model parameters can have different learning rates in real-world problems, especially with skewed data. Two often-faced challenges in Operation Management and Management Science applications are the absence of informative priors, and the inability to control parameter learning rates. In this study, we propose a hierarchical Empirical Bayes approach that addresses both challenges, and that can generalize to any Bayesian framework. Our method learns empirical meta-priors from the data itself and uses them to decouple the learning rates of first-order and second-order features (or any other given feature grouping) in a Generalized Linear Model. As the first-order features are likely to have a more pronounced effect on the outcome, focusing on learning first-order weights first is likely to improve performance and convergence time. Our Empirical Bayes method clamps features in each group together and uses the deployed model's observed data to empirically compute a hierarchical prior in hindsight. We report theoretical results for the unbiasedness, strong consistency, and optimal frequentist cumulative regret properties of our meta-prior variance estimator. We apply our method to a standard supervised learning optimization problem, as well as an online combinatorial optimization problem in a contextual bandit setting implemented in an Amazon production system. Both during simulations and live experiments, our method shows marked improvements, especially in cases of small traffic. Our findings are promising, as optimizing over sparse data is often a challenge.Comment: Expanded discussions on applications and extended literature review section. Forthcoming in the Management Science Journa

    Multiple Treatment Modeling for Target Marketing Campaigns: A Large-Scale Benchmark Study

    Get PDF
    Machine learning and artificial intelligence (ML/AI) promise higher degrees of personalization and enhanced efficiency in marketing communication. The paper focuses on causal ML/AI models for campaign targeting. Such models estimate the change in customer behavior due to a marketing action known as the individual treatment effect (ITE) or uplift. ITE estimates capture the value of a marketing action when applied to a specific customer and facilitate effective and efficient targeting. We consolidate uplift models for multiple treatments and continuous outcomes and perform a benchmarking study to demonstrate their potential to target promotional monetary campaigns. In this use case, the new models facilitate selecting the optimal discount amount to offer to a customer. Large-scale analysis based on eight marketing data sets from leading B2C retailers confirms the significant gains in the campaign return on marketing when using the new models compared to relevant model benchmarks and conventional marketing practices.Peer Reviewe

    Online Causal Inference for Advertising in Real-Time Bidding Auctions

    Full text link
    Real-time bidding (RTB) systems, which leverage auctions to programmatically allocate user impressions to multiple competing advertisers, continue to enjoy widespread success in digital advertising. Assessing the effectiveness of such advertising remains a lingering challenge in research and practice. This paper presents a new experimental design to perform causal inference on advertising bought through such mechanisms. Our method leverages the economic structure of first- and second-price auctions, which are ubiquitous in RTB systems, embedded within a multi-armed bandit (MAB) setup for online adaptive experimentation. We implement it via a modified Thompson sampling (TS) algorithm that estimates causal effects of advertising while minimizing the costs of experimentation to the advertiser by simultaneously learning the optimal bidding policy that maximizes her expected payoffs from auction participation. Simulations show that not only the proposed method successfully accomplishes the advertiser's goals, but also does so at a much lower cost than more conventional experimentation policies aimed at performing causal inference

    Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits

    Full text link
    We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al. (2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on 1/κ1/\kappa, where κ\kappa is the minimal variance over all arms' reward distributions. In general, 1/κ1/\kappa scales exponentially with the norm of the unknown linear parameter θ∗\theta^*. Instead of relying on this worst-case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm's reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration, we also provide a lower bound highlighting a dependence on 1/κ1/\kappa for a family of instances

    Homomorphically Encrypted Linear Contextual Bandit

    Full text link
    Contextual bandit is a general framework for online learning in sequential decision-making problems that has found application in a large range of domains, including recommendation system, online advertising, clinical trials and many more. A critical aspect of bandit methods is that they require to observe the contexts -- i.e., individual or group-level data -- and the rewards in order to solve the sequential problem. The large deployment in industrial applications has increased interest in methods that preserve the privacy of the users. In this paper, we introduce a privacy-preserving bandit framework based on asymmetric encryption. The bandit algorithm only observes encrypted information (contexts and rewards) and has no ability to decrypt it. Leveraging homomorphic encryption, we show that despite the complexity of the setting, it is possible to learn over encrypted data. We introduce an algorithm that achieves a O~(dT)\widetilde{O}(d\sqrt{T}) regret bound in any linear contextual bandit problem, while keeping data encrypted
    corecore