Search CORE

247,174 research outputs found

Discrete mechanics and optimal control: An analysis

Author: Junge Oliver
Marsden Jerrold E.
Ober-Blöbaum Sina
Publication venue: 'EDP Sciences'
Publication date: 01/01/2009
Field of study

The optimal control of a mechanical system is of crucial importance in many application areas. Typical examples are the determination of a time-minimal path in vehicle dynamics, a minimal energy trajectory in space mission design, or optimal motion sequences in robotics and biomechanics. In most cases, some sort of discretization of the original, infinite-dimensional optimization problem has to be performed in order to make the problem amenable to computations. The approach proposed in this paper is to directly discretize the variational description of the system's motion. The resulting optimization algorithm lets the discrete solution directly inherit characteristic structural properties from the continuous one like symmetries and integrals of the motion. We show that the DMOC (Discrete Mechanics and Optimal Control) approach is equivalent to a finite difference discretization of Hamilton's equations by a symplectic partitioned Runge-Kutta scheme and employ this fact in order to give a proof of convergence. The numerical performance of DMOC and its relationship to other existing optimal control methods are investigated

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Caltech Authors

Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

Author: Ermon Stefano
Levy Daniel
Publication venue
Publication date: 21/11/2017
Field of study

Policy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Model-based methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task. Recently, hybrid methods have been successful in trading off applicability for improved sample-complexity. However, these have been limited to continuous action spaces. In this work, we present a new hybrid method based on an approximation of the dynamics as an expectation over the next state under the current policy. This relaxation allows us to derive a novel hybrid policy gradient estimator, combining score function and pathwise derivative estimators, that is applicable to discrete action spaces. We show significant gains in sample complexity, ranging between

1.7

and

25\times

, when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and Hand Mass. Our method is applicable to both discrete and continuous action spaces, when competing pathwise methods are limited to the latter.Comment: In AAAI 2018 proceeding

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sampling from a system-theoretic viewpoint: Part II - Noncausal solutions

Author: Meinsma Gjerrit
Mirkin Leonid
Publication venue: IEEE Signal Processing Society
Publication date: 01/01/2010
Field of study

This paper puts to use concepts and tools introduced in Part I to address a wide spectrum of noncausal sampling and reconstruction problems. Particularly, we follow the system-theoretic paradigm by using systems as signal generators to account for available information and system norms (L2 and L∞) as performance measures. The proposed optimization-based approach recovers many known solutions, derived hitherto by different methods, as special cases under different assumptions about acquisition or reconstructing devices (e.g., polynomial and exponential cardinal splines for fixed samplers and the Sampling Theorem and its modifications in the case when both sampler and interpolator are design parameters). We also derive new results, such as versions of the Sampling Theorem for downsampling and reconstruction from noisy measurements, the continuous-time invariance of a wide class of optimal sampling-and-reconstruction circuits, etcetera

University of Twente Research Information