695 research outputs found

    Dynamically optimal treatment allocation using Reinforcement Learning

    Full text link
    Devising guidance on how to assign individuals to treatment is an important goal in empirical research. In practice, individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of a year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes expected welfare in this dynamic context. We allow the class of policy rules to be restricted for legal, ethical or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a n−1/2n^{-1/2} rate in most examples; this is the same rate as in the static case.Comment: 67 page

    How to sample and when to stop sampling: The generalized Wald problem and minimax policies

    Full text link
    Acquiring information is expensive. Experimenters need to carefully choose how many units of each treatment to sample and when to stop sampling. The aim of this paper is to develop techniques for incorporating the cost of information into experimental design. In particular, we study sequential experiments where sampling is costly and a decision-maker aims to determine the best treatment for full scale implementation by (1) adaptively allocating units to two possible treatments, and (2) stopping the experiment when the expected welfare (inclusive of sampling costs) from implementing the chosen treatment is maximized. Working under the diffusion limit, we describe the optimal policies under the minimax regret criterion. Under small cost asymptotics, the same policies are also optimal under parametric and non-parametric distributions of outcomes. The minimax optimal sampling rule is just the Neyman allocation; it is independent of sampling costs and does not adapt to previous outcomes. The decision-maker stops sampling when the average difference between the treatment outcomes, multiplied by the number of observations collected until that point, exceeds a specific threshold. We also suggest methods for inference on the treatment effects using stopping times and discuss their optimality

    Design and analysis of movable boundary allocation protocol

    Get PDF
    The increasing digital communications traffic will require very high speed networks. The use of high communication speed increases the ratio between end to end propagation delay and the packet transmission time. This increase causes rapid performance deterioration and restricts the utilization of the high system bandwidth in broadcast channel based systems. Using several parallel channels in place of a single channel improves this ratio. For a given system bandwidth the total system capacity is increased by bandwidth division and parallel communication. FTDMA protocols have been suggested for the parallel channel network and these protocols are suitable for different loads. In this thesis, the movable boundary allocation protocol has been suggested for the parallel communication architecture. This protocol is suitable for varying loads and yields a better throughput versus delay characteristics. The analysis demonstrates the potential for improvement in the system capacity and the average message delay when compared to conventional single channel system

    Optimal tests following sequential experiments

    Full text link
    Recent years have seen tremendous advances in the theory and application of sequential experiments. While these experiments are not always designed with hypothesis testing in mind, researchers may still be interested in performing tests after the experiment is completed. The purpose of this paper is to aid in the development of optimal tests for sequential experiments by analyzing their asymptotic properties. Our key finding is that the asymptotic power function of any test can be matched by a test in a limit experiment where a Gaussian process is observed for each treatment, and inference is made for the drifts of these processes. This result has important implications, including a powerful sufficiency result: any candidate test only needs to rely on a fixed set of statistics, regardless of the type of sequential experiment. These statistics are the number of times each treatment has been sampled by the end of the experiment, along with final value of the score (for parametric models) or efficient influence function (for non-parametric models) process for each treatment. We then characterize asymptotically optimal tests under various restrictions such as unbiasedness, \alpha-spending constraints etc. Finally, we apply our our results to three key classes of sequential experiments: costly sampling, group sequential trials, and bandit experiments, and show how optimal inference can be conducted in these scenarios

    The Malangan Cult of New Ireland

    Get PDF

    Immediate versus early soft tissue coverage for severe open grade III B tibia fractures: a comparative clinical study

    Get PDF
    Controversy remains regarding timing in the management of grade III-B open tibia fractures. Many authors recommend an immediate definitive soft tissue coverage within a critical period of 12 hours, yet in many patients, this may be impossible due to concomitant injuries or delayed referral. The present case series aims to compare the role of immediate versus early soft tissue coverage for severe open grade III-B tibial fractures. 20 cases of tibial fractures of were divided into two groups; 10 cases each. Immediate group (within 12 hours) and early group (3-7 days), according to the soft tissue coverage time. Strict criteria for inclusion in the first group included debridement within 12 hours of injury, no sewage or organic contamination, the presence of bleeding skin margins, and the absence of systemic illness. All 20 cases had been treated by a debridement and soft-tissue cover with a muscle pedicle or fasio-cutanous flap. Functional outcome measures included deep infection rate, stable soft tissue coverage, number of inpatient’s stays, number of surgical procedures, and union time. The mean follow-up period was 24 months. Mean inpatient time was 30 and 41 days respectively. Mean surgical procedures were 2.2 and 3.4 respectively and union time was 26 versus 34 weeks. Mean inpatient time, mean surgical procedures per time and union time were pointedly less in the immediate flap coverage group which significantly improves results concerning early union, healing time, and cost of hospitalization and rehabilitation

    Risk and optimal policies in bandit experiments

    Full text link
    This paper provides a decision theoretic analysis of bandit experiments. The bandit setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit settings. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit setting with a PDE which can be efficiently solved using sparse matrix routines. We derive near-optimal policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives
    • …
    corecore