19 research outputs found

    Approximate Newton Methods for Policy Search in Markov Decision Processes

    Get PDF
    Approximate Newton methods are standard optimization tools which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, while alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov decision processes (MDPs). We first analyse the structure of the Hessian of the total expected reward, which is a standard objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton methods for MDPs. Like the Gauss- Newton method for non-linear least squares, these methods drop certain terms in the Hessian. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss- Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains

    A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

    Get PDF
    Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being considered the current state of the art in the field. In this article we provide a unifying perspective of these two algorithms by showing that their step-directions in the parameter space are closely related to the search direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative gradient-based method for Markov Decision Processes. We are able show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent

    Pravastatin for early-onset pre-eclampsia:a randomised, blinded, placebo-controlled trial

    Get PDF
    Objective: Women with pre-eclampsia have elevated circulating levels of soluble fms-like tyrosine kinase-1 (sFlt-1). Statins can reduce sFlt-1 from cultured cells and improve pregnancy outcome in animals with a pre-eclampsia-like syndrome. We investigated the effect of pravastatin on plasma sFlt-1 levels during pre-eclampsia. Design: Blinded (clinician and participant), proof of principle, placebo-controlled trial. Setting: Fifteen UK maternity units. Population: We used a minimisation algorithm to assign 62 women with early-onset pre-eclampsia (24 +0–31 +6 weeks of gestation) to receive pravastatin 40 mg daily (n = 30) or matched placebo (n = 32), from randomisation to childbirth. Primary outcome: Difference in mean plasma sFlt-1 levels over the first 3 days following randomisation. Results: The difference in the mean maternal plasma sFlt-1 levels over the first 3 days after randomisation between the pravastatin (n = 27) and placebo (n = 29) groups was 292 pg/ml (95% CI −1175 to 592; P = 0.5), and over days 1–14 was 48 pg/ml (95% CI −1009 to 913; P = 0.9). Women who received pravastatin had a similar length of pregnancy following randomisation compared with those who received placebo (hazard ratio 0.84; 95% CI 0.50–1.40; P = 0.6). The median time from randomisation to childbirth was 9 days [interquartile range (IQR) 5–14 days] for the pravastatin group and 7 days (IQR 4–11 days) for the placebo group. There were three perinatal deaths in the placebo-treated group and no deaths or serious adverse events attributable to pravastatin. Conclusions: We found no evidence that pravastatin lowered maternal plasma sFlt-1 levels once early-onset pre-eclampsia had developed. Pravastatin appears to have no adverse perinatal effects. Tweetable abstract: Pravastatin does not improve maternal plasma sFlt-1 or placental growth factor levels following a diagnosis of early preterm pre-eclampsia #clinicaltrial finds

    Contract formation and letters of intent

    No full text

    Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

    No full text
    Abstract. Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation

    Pulse oximetry as a screening test for congenital heart defects in newborn infants : a cost-effectiveness analysis

    No full text
    Objective To undertake a cost-effectiveness analysis that compares pulse oximetry as an adjunct to clinical examination with clinical examination alone in newborn screening for congenital heart defects (CHDs). Design Model-based economic evaluation using accuracy and cost data from a primary study supplemented from published sources taking an NHS perspective. Setting Six large maternity units in the UK. Patients 20 055 newborn infants prior to discharge from hospital. Intervention Pulse oximetry as an adjunct to clinical examination. Main outcome measure Cost effectiveness based on incremental cost per timely diagnosis. Results Pulse oximetry as an adjunct to clinical examination is twice as costly but provides a timely diagnosis to almost 30 additional cases of CHD per 100 000 live births compared with a modelled strategy of clinical examination alone. The incremental cost-effectiveness ratio for this strategy compared with clinical examination alone is approximately £24 000 per case of timely diagnosis in a population in which antenatal screening for CHDs already exists. The probabilistic sensitivity analysis suggests that at a willingness-to-pay (WTP) threshold of £100 000, the probability of ‘pulse oximetry as an adjunct to clinical examination’ being cost effective is more than 90%. Such a WTP threshold is plausible if a newborn with timely diagnosis of a CHD gained just five quality-adjusted life years, even when treatment costs are taken into consideration. Conclusion Pulse oximetry as an adjunct to current routine practice of clinical examination alone is likely to be considered a cost-effective strategy in the light of currently accepted thresholds
    corecore