10 research outputs found
Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach
Suppose an online platform wants to compare a treatment and control policy,
e.g., two different matching algorithms in a ridesharing system, or two
different inventory management algorithms in an online retail site. Standard
randomized controlled trials are typically not feasible, since the goal is to
estimate policy performance on the entire system. Instead, the typical current
practice involves dynamically alternating between the two policies for fixed
lengths of time, and comparing the average performance of each over the
intervals in which they were run as an estimate of the treatment effect.
However, this approach suffers from *temporal interference*: one algorithm
alters the state of the system as seen by the second algorithm, biasing
estimates of the treatment effect. Further, the simple non-adaptive nature of
such designs implies they are not sample efficient.
We develop a benchmark theoretical model in which to study optimal
experimental design for this setting. We view testing the two policies as the
problem of estimating the steady state difference in reward between two unknown
Markov chains (i.e., policies). We assume estimation of the steady state reward
for each chain proceeds via nonparametric maximum likelihood, and search for
consistent (i.e., asymptotically unbiased) experimental designs that are
efficient (i.e., asymptotically minimum variance). Characterizing such designs
is equivalent to a Markov decision problem with a minimum variance objective;
such problems generally do not admit tractable solutions. Remarkably, in our
setting, using a novel application of classical martingale analysis of Markov
chains via Poisson's equation, we characterize efficient designs via a succinct
convex optimization problem. We use this characterization to propose a
consistent, efficient online experimental design that adaptively samples the
two Markov chains
A survey of time consistency of dynamic risk measures and dynamic performance measures in discrete time : LM-measure perspective
In this work we give a comprehensive overview of the time consistency property of dynamic risk and performance measures, focusing on a the discrete time setup. The two key operational concepts used throughout are the notion of the LM-measure and the notion of the update rule that, we believe, are the key tools for studying time consistency in a unified framework
Tight Approximations of Dynamic Risk Measures
This paper compares two different frameworks recently introduced in the literature for measuring risk in a multi-period setting. The first corresponds to applying a single coherent risk measure to the cumulative future costs, while the second involves applying a composition of one-step coherent risk mappings. We summarize the relative strengths of the two methods, characterize several necessary and sufficient conditions under which one of the measurements always dominates the other, and introduce a metric to quantify how close the two risk measures are. Using this notion, we address the question of how tightly a given coherent measure can be approximated by lower or upper-bounding compositional measures. We exhibit an interesting asymmetry between the two cases: the tightest possible upper-bound can be exactly characterized, and corresponds to a popular construction in the literature, while the tightest-possible lower bound is not readily available. We show that testing domination and computing the approximation factors is generally NP-hard, even when the risk measures in question are comonotonic and law-invariant. However, we characterize conditions and discuss several examples where polynomial-time algorithms are possible. One such case is the well-known Conditional Value-at-Risk measure, which is further explored in our companion paper [Huang, Iancu, Petrik and Subramanian, "Static and Dynamic Conditional Value at Risk" (2012)]. Our theoretical and algorithmic constructions exploit interesting connections between the study of risk measures and the theory of submodularity and combinatorial optimization, which may be of independent interest.