10 research outputs found

    Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach

    Full text link
    Suppose an online platform wants to compare a treatment and control policy, e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site. Standard randomized controlled trials are typically not feasible, since the goal is to estimate policy performance on the entire system. Instead, the typical current practice involves dynamically alternating between the two policies for fixed lengths of time, and comparing the average performance of each over the intervals in which they were run as an estimate of the treatment effect. However, this approach suffers from *temporal interference*: one algorithm alters the state of the system as seen by the second algorithm, biasing estimates of the treatment effect. Further, the simple non-adaptive nature of such designs implies they are not sample efficient. We develop a benchmark theoretical model in which to study optimal experimental design for this setting. We view testing the two policies as the problem of estimating the steady state difference in reward between two unknown Markov chains (i.e., policies). We assume estimation of the steady state reward for each chain proceeds via nonparametric maximum likelihood, and search for consistent (i.e., asymptotically unbiased) experimental designs that are efficient (i.e., asymptotically minimum variance). Characterizing such designs is equivalent to a Markov decision problem with a minimum variance objective; such problems generally do not admit tractable solutions. Remarkably, in our setting, using a novel application of classical martingale analysis of Markov chains via Poisson's equation, we characterize efficient designs via a succinct convex optimization problem. We use this characterization to propose a consistent, efficient online experimental design that adaptively samples the two Markov chains

    A survey of time consistency of dynamic risk measures and dynamic performance measures in discrete time : LM-measure perspective

    Get PDF
    In this work we give a comprehensive overview of the time consistency property of dynamic risk and performance measures, focusing on a the discrete time setup. The two key operational concepts used throughout are the notion of the LM-measure and the notion of the update rule that, we believe, are the key tools for studying time consistency in a unified framework

    Tight Approximations of Dynamic Risk Measures

    No full text
    This paper compares two different frameworks recently introduced in the literature for measuring risk in a multi-period setting. The first corresponds to applying a single coherent risk measure to the cumulative future costs, while the second involves applying a composition of one-step coherent risk mappings. We summarize the relative strengths of the two methods, characterize several necessary and sufficient conditions under which one of the measurements always dominates the other, and introduce a metric to quantify how close the two risk measures are. Using this notion, we address the question of how tightly a given coherent measure can be approximated by lower or upper-bounding compositional measures. We exhibit an interesting asymmetry between the two cases: the tightest possible upper-bound can be exactly characterized, and corresponds to a popular construction in the literature, while the tightest-possible lower bound is not readily available. We show that testing domination and computing the approximation factors is generally NP-hard, even when the risk measures in question are comonotonic and law-invariant. However, we characterize conditions and discuss several examples where polynomial-time algorithms are possible. One such case is the well-known Conditional Value-at-Risk measure, which is further explored in our companion paper [Huang, Iancu, Petrik and Subramanian, "Static and Dynamic Conditional Value at Risk" (2012)]. Our theoretical and algorithmic constructions exploit interesting connections between the study of risk measures and the theory of submodularity and combinatorial optimization, which may be of independent interest.
    corecore