5,129 research outputs found
Path integral policy improvement with differential dynamic programming
Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task
Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks
Signal temporal logic (STL) provides a user-friendly interface for defining
complex tasks for robotic systems. Recent efforts aim at designing control laws
or using reinforcement learning methods to find policies which guarantee
satisfaction of these tasks. While the former suffer from the trade-off between
task specification and computational complexity, the latter encounter
difficulties in exploration as the tasks become more complex and challenging to
satisfy. This paper proposes to combine the benefits of the two approaches and
use an efficient prescribed performance control (PPC) base law to guide
exploration within the reinforcement learning algorithm. The potential of the
method is demonstrated in a simulated environment through two sample
navigational tasks.Comment: This is the extended version of the paper accepted to the 2019
American Control Conference (ACC), Philadelphia (to be published
Path integrals and symmetry breaking for optimal control theory
This paper considers linear-quadratic control of a non-linear dynamical
system subject to arbitrary cost. I show that for this class of stochastic
control problems the non-linear Hamilton-Jacobi-Bellman equation can be
transformed into a linear equation. The transformation is similar to the
transformation used to relate the classical Hamilton-Jacobi equation to the
Schr\"odinger equation. As a result of the linearity, the usual backward
computation can be replaced by a forward diffusion process, that can be
computed by stochastic integration or by the evaluation of a path integral. It
is shown, how in the deterministic limit the PMP formalism is recovered. The
significance of the path integral approach is that it forms the basis for a
number of efficient computational methods, such as MC sampling, the Laplace
approximation and the variational approximation. We show the effectiveness of
the first two methods in number of examples. Examples are given that show the
qualitative difference between stochastic and deterministic control and the
occurrence of symmetry breaking as a function of the noise.Comment: 21 pages, 6 figures, submitted to JSTA
Time-dependent opportunities in energy business : a comparative study of locally available renewable and conventional fuels
This work investigates and compares energy-related, private business strategies, potentially interesting for investors willing to exploit either local biomass sources or strategic conventional fuels. Two distinct fuels and related power-production technologies are compared as a case study, in terms of economic efficiency: the biomass of cotton stalks and the natural gas. The carbon capture and storage option are also investigated for power plants based on both fuel types. The model used in this study investigates important economic aspects using a "real options" method instead of traditional Discounted Cash Flow techniques, as it might handle in a more effective way the problems arising from the stochastic nature of significant cash flow contributors' evolution like electricity, fuel and CO(2) allowance prices. The capital costs have also a functional relationship with time, thus providing an additional reason for implementing, "real options" as well as the learning-curves technique. The methodology as well as the results presented in this work, may lead to interesting conclusions and affect potential private investment strategies and future decision making. This study indicates that both technologies lead to positive investment yields, with the natural gas being more profitable for the case study examined, while the carbon capture and storage does not seem to be cost efficient with the current CO(2) allowance prices. Furthermore, low interest rates might encourage potential investors to wait before actualising their business plans while higher interest rates favor immediate investment decisions. (C) 2009 Elsevier Ltd. All rights reserved
A General Theory of Price and Quantity Aggregation and Welfare Measurement
The paper presents a general theory of the aggregation of prices and quantities that unifies the field and relates topics that in the past have been treated separately and unsatisfactorily, or not at all. The theory does without the common but unrealistic assumptions of homotheticity, or representative agents and is valid with or without an explicit utility maximization assumption. Two different derivations are given, one in continuous time, using Divisia integrals, and one employing more traditional discrete arguments. The unifying concept is the money metric, which is interpreted as a partial welfare indicator, rather than as a comprehensive welfare measure. On this basis, a consistent set of chained price and quantity indexes for a set of additive time series, such as those in the national income and product accounts, is derived. All variants of the theory lead to Törnqvist indexes defined on the appropriate data set. A numerical example confirms that in the non-homothetic case, these indexes are superior both to Fisher’s ‘ideal’ index and to the consumer surplus approximation.chain indexes, consumer surplus, cost-of-living, divisia integral, money metric price index, quantity index, real consumption, Törnqvist index
- …