695 research outputs found
Dynamically optimal treatment allocation using Reinforcement Learning
Devising guidance on how to assign individuals to treatment is an important
goal in empirical research. In practice, individuals often arrive sequentially,
and the planner faces various constraints such as limited budget/capacity, or
borrowing constraints, or the need to place people in a queue. For instance, a
governmental body may receive a budget outlay at the beginning of a year, and
it may need to decide how best to allocate resources within the year to
individuals who arrive sequentially. In this and other examples involving
inter-temporal trade-offs, previous work on devising optimal policy rules in a
static context is either not applicable, or sub-optimal. Here we show how one
can use offline observational data to estimate an optimal policy rule that
maximizes expected welfare in this dynamic context. We allow the class of
policy rules to be restricted for legal, ethical or incentive compatibility
reasons. The problem is equivalent to one of optimal control under a
constrained policy class, and we exploit recent developments in Reinforcement
Learning (RL) to propose an algorithm to solve this. The algorithm is easily
implementable with speedups achieved through multiple RL agents learning in
parallel processes. We also characterize the statistical regret from using our
estimated policy rule by casting the evolution of the value function under each
policy in a Partial Differential Equation (PDE) form and using the theory of
viscosity solutions to PDEs. We find that the policy regret decays at a
rate in most examples; this is the same rate as in the static case.Comment: 67 page
How to sample and when to stop sampling: The generalized Wald problem and minimax policies
Acquiring information is expensive. Experimenters need to carefully choose
how many units of each treatment to sample and when to stop sampling. The aim
of this paper is to develop techniques for incorporating the cost of
information into experimental design. In particular, we study sequential
experiments where sampling is costly and a decision-maker aims to determine the
best treatment for full scale implementation by (1) adaptively allocating units
to two possible treatments, and (2) stopping the experiment when the expected
welfare (inclusive of sampling costs) from implementing the chosen treatment is
maximized. Working under the diffusion limit, we describe the optimal policies
under the minimax regret criterion. Under small cost asymptotics, the same
policies are also optimal under parametric and non-parametric distributions of
outcomes. The minimax optimal sampling rule is just the Neyman allocation; it
is independent of sampling costs and does not adapt to previous outcomes. The
decision-maker stops sampling when the average difference between the treatment
outcomes, multiplied by the number of observations collected until that point,
exceeds a specific threshold. We also suggest methods for inference on the
treatment effects using stopping times and discuss their optimality
Design and analysis of movable boundary allocation protocol
The increasing digital communications traffic will require very high speed networks. The use of high communication speed increases the ratio between end to end propagation delay and the packet transmission time. This increase causes rapid performance deterioration and restricts the utilization of the high system bandwidth in broadcast channel based systems. Using several parallel channels in place of a single channel improves this ratio. For a given system bandwidth the total system capacity is increased by bandwidth division and parallel communication. FTDMA protocols have been suggested for the parallel channel network and these protocols are suitable for different loads. In this thesis, the movable boundary allocation protocol has been suggested for the parallel communication architecture. This protocol is suitable for varying loads and yields a better throughput versus delay characteristics. The analysis demonstrates the potential for improvement in the system capacity and the average message delay when compared to conventional single channel system
Optimal tests following sequential experiments
Recent years have seen tremendous advances in the theory and application of
sequential experiments. While these experiments are not always designed with
hypothesis testing in mind, researchers may still be interested in performing
tests after the experiment is completed. The purpose of this paper is to aid in
the development of optimal tests for sequential experiments by analyzing their
asymptotic properties. Our key finding is that the asymptotic power function of
any test can be matched by a test in a limit experiment where a Gaussian
process is observed for each treatment, and inference is made for the drifts of
these processes. This result has important implications, including a powerful
sufficiency result: any candidate test only needs to rely on a fixed set of
statistics, regardless of the type of sequential experiment. These statistics
are the number of times each treatment has been sampled by the end of the
experiment, along with final value of the score (for parametric models) or
efficient influence function (for non-parametric models) process for each
treatment. We then characterize asymptotically optimal tests under various
restrictions such as unbiasedness, \alpha-spending constraints etc. Finally, we
apply our our results to three key classes of sequential experiments: costly
sampling, group sequential trials, and bandit experiments, and show how optimal
inference can be conducted in these scenarios
Immediate versus early soft tissue coverage for severe open grade III B tibia fractures: a comparative clinical study
Controversy remains regarding timing in the management of grade III-B open tibia fractures. Many authors recommend an immediate definitive soft tissue coverage within a critical period of 12 hours, yet in many patients, this may be impossible due to concomitant injuries or delayed referral. The present case series aims to compare the role of immediate versus early soft tissue coverage for severe open grade III-B tibial fractures. 20 cases of tibial fractures of were divided into two groups; 10 cases each. Immediate group (within 12 hours) and early group (3-7 days), according to the soft tissue coverage time. Strict criteria for inclusion in the first group included debridement within 12 hours of injury, no sewage or organic contamination, the presence of bleeding skin margins, and the absence of systemic illness. All 20 cases had been treated by a debridement and soft-tissue cover with a muscle pedicle or fasio-cutanous flap. Functional outcome measures included deep infection rate, stable soft tissue coverage, number of inpatient’s stays, number of surgical procedures, and union time. The mean follow-up period was 24 months. Mean inpatient time was 30 and 41 days respectively. Mean surgical procedures were 2.2 and 3.4 respectively and union time was 26 versus 34 weeks. Mean inpatient time, mean surgical procedures per time and union time were pointedly less in the immediate flap coverage group which significantly improves results concerning early union, healing time, and cost of hospitalization and rehabilitation
Risk and optimal policies in bandit experiments
This paper provides a decision theoretic analysis of bandit experiments. The
bandit setting corresponds to a dynamic programming problem, but solving this
directly is typically infeasible. Working within the framework of diffusion
asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit
settings. For normally distributed rewards, the minimal Bayes risk can be
characterized as the solution to a nonlinear second-order partial differential
equation (PDE). Using a limit of experiments approach, we show that this PDE
characterization also holds asymptotically under both parametric and
non-parametric distribution of the rewards. The approach further describes the
state variables it is asymptotically sufficient to restrict attention to, and
therefore suggests a practical strategy for dimension reduction. The upshot is
that we can approximate the dynamic programming problem defining the bandit
setting with a PDE which can be efficiently solved using sparse matrix
routines. We derive near-optimal policies from the numerical solutions to these
equations. The proposed policies substantially dominate existing methods such
Thompson sampling. The framework also allows for substantial generalizations to
the bandit problem such as time discounting and pure exploration motives
- …