5,129 research outputs found

    Path integral policy improvement with differential dynamic programming

    Get PDF
    Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task

    Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks

    Full text link
    Signal temporal logic (STL) provides a user-friendly interface for defining complex tasks for robotic systems. Recent efforts aim at designing control laws or using reinforcement learning methods to find policies which guarantee satisfaction of these tasks. While the former suffer from the trade-off between task specification and computational complexity, the latter encounter difficulties in exploration as the tasks become more complex and challenging to satisfy. This paper proposes to combine the benefits of the two approaches and use an efficient prescribed performance control (PPC) base law to guide exploration within the reinforcement learning algorithm. The potential of the method is demonstrated in a simulated environment through two sample navigational tasks.Comment: This is the extended version of the paper accepted to the 2019 American Control Conference (ACC), Philadelphia (to be published

    Path integrals and symmetry breaking for optimal control theory

    Get PDF
    This paper considers linear-quadratic control of a non-linear dynamical system subject to arbitrary cost. I show that for this class of stochastic control problems the non-linear Hamilton-Jacobi-Bellman equation can be transformed into a linear equation. The transformation is similar to the transformation used to relate the classical Hamilton-Jacobi equation to the Schr\"odinger equation. As a result of the linearity, the usual backward computation can be replaced by a forward diffusion process, that can be computed by stochastic integration or by the evaluation of a path integral. It is shown, how in the deterministic limit the PMP formalism is recovered. The significance of the path integral approach is that it forms the basis for a number of efficient computational methods, such as MC sampling, the Laplace approximation and the variational approximation. We show the effectiveness of the first two methods in number of examples. Examples are given that show the qualitative difference between stochastic and deterministic control and the occurrence of symmetry breaking as a function of the noise.Comment: 21 pages, 6 figures, submitted to JSTA

    Time-dependent opportunities in energy business : a comparative study of locally available renewable and conventional fuels

    Get PDF
    This work investigates and compares energy-related, private business strategies, potentially interesting for investors willing to exploit either local biomass sources or strategic conventional fuels. Two distinct fuels and related power-production technologies are compared as a case study, in terms of economic efficiency: the biomass of cotton stalks and the natural gas. The carbon capture and storage option are also investigated for power plants based on both fuel types. The model used in this study investigates important economic aspects using a "real options" method instead of traditional Discounted Cash Flow techniques, as it might handle in a more effective way the problems arising from the stochastic nature of significant cash flow contributors' evolution like electricity, fuel and CO(2) allowance prices. The capital costs have also a functional relationship with time, thus providing an additional reason for implementing, "real options" as well as the learning-curves technique. The methodology as well as the results presented in this work, may lead to interesting conclusions and affect potential private investment strategies and future decision making. This study indicates that both technologies lead to positive investment yields, with the natural gas being more profitable for the case study examined, while the carbon capture and storage does not seem to be cost efficient with the current CO(2) allowance prices. Furthermore, low interest rates might encourage potential investors to wait before actualising their business plans while higher interest rates favor immediate investment decisions. (C) 2009 Elsevier Ltd. All rights reserved

    A General Theory of Price and Quantity Aggregation and Welfare Measurement

    Get PDF
    The paper presents a general theory of the aggregation of prices and quantities that unifies the field and relates topics that in the past have been treated separately and unsatisfactorily, or not at all. The theory does without the common but unrealistic assumptions of homotheticity, or representative agents and is valid with or without an explicit utility maximization assumption. Two different derivations are given, one in continuous time, using Divisia integrals, and one employing more traditional discrete arguments. The unifying concept is the money metric, which is interpreted as a partial welfare indicator, rather than as a comprehensive welfare measure. On this basis, a consistent set of chained price and quantity indexes for a set of additive time series, such as those in the national income and product accounts, is derived. All variants of the theory lead to Törnqvist indexes defined on the appropriate data set. A numerical example confirms that in the non-homothetic case, these indexes are superior both to Fisher’s ‘ideal’ index and to the consumer surplus approximation.chain indexes, consumer surplus, cost-of-living, divisia integral, money metric price index, quantity index, real consumption, Törnqvist index
    corecore