7 research outputs found

    Online Pricing with Offline Data: Phase Transition and Inverse Square Law

    Full text link
    This paper investigates the impact of pre-existing offline data on online learning, in the context of dynamic pricing. We study a single-product dynamic pricing problem over a selling horizon of TT periods. The demand in each period is determined by the price of the product according to a linear demand model with unknown parameters. We assume that before the start of the selling horizon, the seller already has some pre-existing offline data. The offline data set contains nn samples, each of which is an input-output pair consisting of a historical price and an associated demand observation. The seller wants to utilize both the pre-existing offline data and the sequential online data to minimize the regret of the online learning process. We characterize the joint effect of the size, location and dispersion of the offline data on the optimal regret of the online learning process. Specifically, the size, location and dispersion of the offline data are measured by the number of historical samples nn, the distance between the average historical price and the optimal price δ\delta, and the standard deviation of the historical prices σ\sigma, respectively. We show that the optimal regret is Θ~(TT(nT)δ2+nσ2)\widetilde \Theta\left(\sqrt{T}\wedge \frac{T}{(n\wedge T)\delta^2+n\sigma^2}\right), and design a learning algorithm based on the "optimism in the face of uncertainty" principle, whose regret is optimal up to a logarithmic factor. Our results reveal surprising transformations of the optimal regret rate with respect to the size of the offline data, which we refer to as phase transitions. In addition, our results demonstrate that the location and dispersion of the offline data also have an intrinsic effect on the optimal regret, and we quantify this effect via the inverse-square law.Comment: Forthcoming in Management Scienc

    Data-Driven Dynamic Decision Making: Algorithms, Structures, and Complexity Analysis

    No full text
    This thesis aims to advance the theory and practice of data-driven dynamic decision making, by synergizing ideas from machine learning and operations research. Throughout this thesis, we focus on three aspects: (i) developing new, practical algorithms that systematically empower data-driven dynamic decision making, (ii) identifying and utilizing key problem structures that lead to statistical and computational efficiency, and (iii) contributing to a general understanding of the statistical and computational complexity of data-driven dynamic decision making, which parallels our understanding of supervised machine learning and also accounts for the crucial roles of model structures and constraints for decision making. Specifically, the thesis consists of three parts. Part I of this thesis develops methodologies that automatically translate advances in supervised learning into effective dynamic decision making. Focusing on contextual bandits, a core class of online decision-making problems, we present the first optimal and efficient reduction from contextual bandits to offline regression. A remarkable consequence of our results is that advances in offline regression immediately translate to contextual bandits, statistically and computationally. We illustrate the advantages of our results through new guarantees in complex operational environments and experiments on real-world datasets. We also extend our results to more challenging setups, including reinforcement learning in large state spaces. Beyond the positive results, we establish new fundamental limits for general, unstructured reinforcement learning, emphasizing the importance of problem structures in reinforcement learning. Part II of this thesis develops a framework that incorporates offline data into online decision making, motivated by practical challenges in business and operations. In the context of dynamic pricing, the framework allows us to rigorously characterize the value of data and the synergy between online and offline learning in data-driven decision making. The theory provides important insights for practice. Part III of this thesis studies classical online decision-making problems in new settings where the decision maker may face a variety of long-term constraints. Such constraints are motivated by societal and operational considerations, and may limit the decision maker’s ability to switch between actions, consume resources, or query accumulated data. We characterize the statistical and computational consequences brought by such long-term constraints, i.e., how the complexity of the problem changes with respect to different levels of constraints. The results provide precise characterizations on various intriguing trade-offs in data-driven dynamic decision making.Ph.D
    corecore