4,255 research outputs found

    Approximate policy iteration: A survey and some new methods

    Get PDF
    We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

    Use of Multiscale Entropy to Characterize Fetal Autonomic Development

    Get PDF
    The idea that uterine environment and adverse events during fetal development could increase the chances of the diseases in adulthood was first published by David Barker in 1998. Since then, investigators have been employing several methods and methodologies for studying and characterizing the ontological development of the fetus, e.g., fetal movement, growth and cardiac metrics. Even with most recent and developed methods such as fetal magnetocardiography (fMCG), investigators are continuously challenged to study fetal development; the fetus is inaccessible. Finding metrics that realize the full capacity of characterizing fetal ontological development remains a technological challenge. In this thesis, the use and value of multiscale entropy to characterize fetal maturation across third trimester of gestation is studied. Using multiscale entropy obtained from participants of a clinical trial, we show that MSE can characterize increasing complexity due to maturation in the fetus, and can distinguish a growing and developing fetal system from a mature system where loss of irregularity is due to compromised complexity from increasing physiologic load. MSE scales add a nonlinear metric that seems to accurately reflect the ontological development of the fetus and hold promise for future use to investigate the effects of maternal stress, intrauterine growth restriction, or predict risk for sudden infant death syndrome

    Convergence of Least Squares Temporal Difference Methods Under General Conditions

    Get PDF
    We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least squares temporal difference algorithm, LSTD(λ). We establish for the discounted cost criterion that the off-policy LSTD(λ) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm, and based on them, we suggest a modification in its practical implementation. Our analysis uses theories of both finite space Markov chains and Markov chains on topological spaces, in particular, the e-chains
    corecore