Search CORE

1,520 research outputs found

Benchmarking Deep Reinforcement Learning for Continuous Control

Author: Abbeel Pieter
Chen Xi
Duan Yan
Houthooft Rein
Schulman John
Publication venue
Publication date: 01/01/2016
Field of study

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Author: Cichosz P.
Publication venue
Publication date: 31/12/1994
Field of study

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda &gt 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Chance-Constrained Control with Lexicographic Deep Reinforcement Learning

Author: Giuseppi A.
Pietrabissa A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes, in which the controller seeks to ensure that the probability of satisfying the constraint is above a given threshold. Standard DeepRL approaches require i) the constraints to be included as additional weighted terms in the cost function, in a multi-objective fashion, and ii) the tuning of the introduced weights during the training phase of the Deep Neural Network (DNN) according to the probability thresholds. The proposed approach, instead, requires to separately train one constraint-free DNN and one DNN associated to each constraint and then, at each time-step, to select which DNN to use depending on the system observed state. The presented solution does not require any hyper-parameter tuning besides the standard DNN ones, even if the probability thresholds changes. A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Archivio della ricerca- Università di Roma La Sapienza

Pseudorehearsal in actor-critic agents with neural network function approximation

Author: Marochko Vladimir
Johard Leonard
Mazzara Manuel
Longo Luca
Publication venue
Publication date: 19/02/2018
Field of study

Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

arXiv.org e-Print Archive

FigShare

Pseudorehearsal in actor-critic agents with neural network function approximation

Author: Johard Leonard
Longo Luca
Marochko Vladimir
Mazzara Manuel
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Crossref

Arrow@TUDublin

The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

Author: Fairbank Michael
Publication venue
Publication date: 22/02/2013
Field of study

In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do "clipping" on the motion of the agent in the final time step of the trajectory. By clipping we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum; and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms which use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include Backpropagation Through Time for Control, and methods based on Dual Heuristic Dynamic Programming. However the clipping problem does not significantly affect methods based on Heuristic Dynamic Programming, Temporal Differences or Policy Gradient Learning algorithms. Similarly, the clipping problem does not affect fixed-length finite-horizon problems

arXiv.org e-Print Archive

CiteSeerX