3,678 research outputs found

    Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

    Full text link
    We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.Comment: 23 pages (relaxed some important assumptions from the previous version), accepted in Mathematics of Operations Research in Feb, 201

    Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

    Full text link
    The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions

    Dynamic disorder in simple enzymatic reactions induces stochastic amplification of substrate

    Get PDF
    A growing amount of evidence points to the fact that many enzymes exhibit fluctuations in their catalytic activity, which are associated with conformational changes on a broad range of timescales. The experimental study of this phenomenon, termed dynamic disorder, has become possible due to advances in single-molecule enzymology measurement techniques, through which the catalytic activity of individual enzyme molecules can be tracked in time. The biological role and importance of these fluctuations in a system with a small number of enzymes such as a living cell have only recently started being explored. In this work, we examine a simple stochastic reaction system consisting of an inflowing substrate and an enzyme with a randomly fluctuating catalytic reaction rate that converts the substrate into an outflowing product. To describe analytically the effect of rate fluctuations on the average substrate abundance at steady-state, we derive an explicit formula that connects the relative speed of enzymatic fluctuations with the mean substrate level. We demonstrate that the relative speed of rate fluctuations can have a dramatic effect on the mean substrate, and lead to large positive deviations from predictions based on the assumption of deterministic enzyme activity. Our results also establish an interesting connection between the amplification effect and the mixing properties of the Markov process describing the enzymatic activity fluctuations, which can be used to easily predict the fluctuation speed above which such deviations become negligible. As the techniques of single-molecule enzymology continuously evolve, it may soon be possible to study the stochastic phenomena due to enzymatic activity fluctuations within living cells. Our work can be used to formulate experimentally testable hypotheses regarding the magnitude of these fluctuations, as well as their phenotypic consequences.Comment: 7 Figure

    Modelling and feedback control design for quantum state preparation

    Get PDF
    The goal of this article is to provide a largely self-contained introduction to the modelling of controlled quantum systems under continuous observation, and to the design of feedback controls that prepare particular quantum states. We describe a bottom-up approach, where a field-theoretic model is subjected to statistical inference and is ultimately controlled. As an example, the formalism is applied to a highly idealized interaction of an atomic ensemble with an optical field. Our aim is to provide a unified outline for the modelling, from first principles, of realistic experiments in quantum control

    Policy Gradients for CVaR-Constrained MDPs

    Full text link
    We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along the lines of Bardou et al. [2009], which in turn is based on Rockafellar-Uryasev's representation for CVaR and utilize the likelihood ratio principle for estimating the gradient of the sum of one cost function (objective of the SSP) and the gradient of the CVaR of the sum of another cost function (in the constraint of SSP). The algorithms differ in the manner in which they approximate the CVaR estimates/necessary gradients - the first algorithm uses stochastic approximation, while the second employ mini-batches in the spirit of Monte Carlo methods. We establish asymptotic convergence of both the algorithms. Further, since estimating CVaR is related to rare-event simulation, we incorporate an importance sampling based variance reduction scheme into our proposed algorithms
    corecore