7 research outputs found
Online Pricing with Offline Data: Phase Transition and Inverse Square Law
This paper investigates the impact of pre-existing offline data on online
learning, in the context of dynamic pricing. We study a single-product dynamic
pricing problem over a selling horizon of periods. The demand in each
period is determined by the price of the product according to a linear demand
model with unknown parameters. We assume that before the start of the selling
horizon, the seller already has some pre-existing offline data. The offline
data set contains samples, each of which is an input-output pair consisting
of a historical price and an associated demand observation. The seller wants to
utilize both the pre-existing offline data and the sequential online data to
minimize the regret of the online learning process.
We characterize the joint effect of the size, location and dispersion of the
offline data on the optimal regret of the online learning process.
Specifically, the size, location and dispersion of the offline data are
measured by the number of historical samples , the distance between the
average historical price and the optimal price , and the standard
deviation of the historical prices , respectively. We show that the
optimal regret is , and design a learning algorithm based on the
"optimism in the face of uncertainty" principle, whose regret is optimal up to
a logarithmic factor. Our results reveal surprising transformations of the
optimal regret rate with respect to the size of the offline data, which we
refer to as phase transitions. In addition, our results demonstrate that the
location and dispersion of the offline data also have an intrinsic effect on
the optimal regret, and we quantify this effect via the inverse-square law.Comment: Forthcoming in Management Scienc
Data-Driven Dynamic Decision Making: Algorithms, Structures, and Complexity Analysis
This thesis aims to advance the theory and practice of data-driven dynamic decision making, by synergizing ideas from machine learning and operations research. Throughout this thesis, we focus on three aspects: (i) developing new, practical algorithms that systematically empower data-driven dynamic decision making, (ii) identifying and utilizing key problem structures that lead to statistical and computational efficiency, and (iii) contributing to a general understanding of the statistical and computational complexity of data-driven dynamic decision making, which parallels our understanding of supervised machine learning and also accounts for the crucial roles of model structures and constraints for decision making.
Specifically, the thesis consists of three parts.
Part I of this thesis develops methodologies that automatically translate advances in supervised learning into effective dynamic decision making. Focusing on contextual bandits, a core class of online decision-making problems, we present the first optimal and efficient reduction from contextual bandits to offline regression. A remarkable consequence of our results is that advances in offline regression immediately translate to contextual bandits, statistically and computationally. We illustrate the advantages of our results through new guarantees in complex operational environments and experiments on real-world datasets. We also extend our results to more challenging setups, including reinforcement learning in large state spaces. Beyond the positive results, we establish new fundamental limits for general, unstructured reinforcement learning, emphasizing the importance of problem structures in reinforcement learning.
Part II of this thesis develops a framework that incorporates offline data into online decision making, motivated by practical challenges in business and operations. In the context of dynamic pricing, the framework allows us to rigorously characterize the value of data and the synergy between online and offline learning in data-driven decision making. The theory provides important insights for practice.
Part III of this thesis studies classical online decision-making problems in new settings where the decision maker may face a variety of long-term constraints. Such constraints are motivated by societal and operational considerations, and may limit the decision maker’s ability to switch between actions, consume resources, or query accumulated data. We characterize the statistical and computational consequences brought by such long-term constraints, i.e., how the complexity of the problem changes with respect to different levels of constraints. The results provide precise characterizations on various intriguing trade-offs in data-driven dynamic decision making.Ph.D