Search CORE

199 research outputs found

Deep Reinforcement Learning: A Brief Survey

Author: Arulkumaran K
Bharath AA
Brundage M
Deisenroth MP
Publication venue: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/11/2017
Field of study

Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higher-level understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field

UCL Discovery

Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

Author: A Liniger
B Paden
C Urmson
CW Anderson
D Dolgov
D Wierstra
DQ Mayne
E Frazzoli
HT Siegelmann
J Xu
P Falcone
R Tedrake
T Schouwenaars
Publication venue
Publication date: 02/08/2018
Field of study

Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

Author: Goecks Vinicius G.
Publication venue
Publication date: 30/08/2020
Field of study

Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more information, see https://vggoecks.com

arXiv.org e-Print Archive

Texas A&M Repository

Learning predictive cognitive maps with spiking neurons during behaviour and replays

Author: Bono J
Clopath C
Pedrosa V
Zannone S
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 12/01/2023
Field of study

The hippocampus has been proposed to encode environments using a representation that contains predictive information about likely future states, called the successor representation. However, it is not clear how such a representation could be learned in the hippocampal circuit. Here, we propose a plasticity rule that can learn this predictive map of the environment using a spiking neural network. We connect this biologically plausible plasticity rule to reinforcement learning, mathematically and numerically showing that it implements the TD-lambda algorithm. By spanning these different levels, we show how our framework naturally encompasses behavioral activity and replays, smoothly moving from rate to temporal coding, and allows learning over behavioral timescales with a plasticity rule acting on a timescale of milliseconds. We discuss how biological parameters such as dwelling times at states, neuronal firing rates and neuromodulation relate to the delay discounting parameter of the TD algorithm, and how they influence the learned representation. We also find that, in agreement with psychological studies and contrary to reinforcement learning theory, the discount factor decreases hyperbolically with time. Finally, our framework suggests a role for replays, in both aiding learning in novel environments and finding shortcut trajectories that were not experienced during behavior, in agreement with experimental data

Spiral - Imperial College Digital Repository

Recommended from our members

Sample-Efficient Deep Reinforcement Learning for Continuous Control

Author: Gu Shixiang
Publication venue: University of Cambridge
Publication date: 01/07/2019
Field of study

Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from playing computer games with pixel inputs, to mastering the game of Go, to learning parkour movements by simulated humanoids. However, the common RL approaches are known to be sample intensive, making them difficult to be applied to real-world problems such as robotics. This thesis makes several contributions toward developing RL algorithms for learning in the wild, where sample-efficiency and stability are critical. The key contributions include Normalized Advantage Functions (NAF), extending Q-learning for continuous action problems; Interpolated Policy Gradient (IPG), unifying prior policy gradient algorithm variants through theoretical analyses on bias and variance; and Temporal Difference Models (TDM), interpreting a parameterized Q-function as a generalized dynamics model for novel temporally abstracted model-based planning. Importantly, this thesis highlights that these algorithms can be seen as bridging gaps between branches of RL – model-based with modelfree, and on-policy with off-policy. The proposed algorithms not only achieve substantial improvements over the prior approaches, but also provide novel perspectives on how to mix different branches of RL effectively to gain the best of both worlds. NAF has subsequently been shown to be able to train two 7-DoF robot arms to open doors using only 2.5 hours of real-world experience, making it one of the first demonstrations of deep RL approaches on real robots.- Cambridge-Tuebingen PhD Fellowship in Machine Learning - Google Focused Research Award - NSER

Apollo (Cambridge)

MPG.PuRe

Deep Reinforcement Learning based Path-Planning for Multi-Agent Systems in Advection-Diffusion Field Reconstruction Tasks

Author: Talwar Deepak
Publication venue: SJSU ScholarWorks
Publication date: 30/12/2020
Field of study

Many environmental processes can be represented mathematically using spatial-temporal varying partial-differential equations. Timely estimation and prediction of processes such as wildfires is critical for disaster management response, but is difficult to accomplish without the availability of a dense network of stationary sensors. In this work, we propose a deep reinforcement learning-based real-time path-planning algorithm for mobile sensor networks traveling in a formation through a spatial-temporal varying advection-diffusion field for the task of field reconstruction. A deep Q-network (DQN) agent is trained on simulated advection-diffusion fields to direct the mobile sensor network to travel along information-rich trajectories. The field measurements made by the mobile sensor network along their trajectories enable identification of field advection parameters, which are required for field reconstruction. A cooperative Kalman filter developed in previous works is employed to receive estimates of the field values and gradients, which are essential for reconstruction as well as for the estimation of the diffusion parameter. A mechanism is provided that encourages exploration in the field domain once a stationary state is reached, which allows the algorithm to identify other information-rich trajectories that may exist in the field improving reconstruction performance significantly. Two simulation environments of different fidelities are provided to test the feasibility of the proposed algorithm. The low-fidelity simulation environment is used for training of the DQN agent. The high-fidelity simulation environment is based on Robot Operating System (ROS) and simulates real robots. We provide results of running sample test episodes in both environments which demonstrate the effectiveness and feasibility of the proposed algorithm

SJSU ScholarWorks

Deciphering the Firing Patterns of Hippocampal Neurons During Sharp-Wave Ripples

Author: Maboudi Ashmankamachali Kourosh
Publication venue: UWM Digital Commons
Publication date: 01/12/2022
Field of study

The hippocampus is essential for learning and memory. Neurons in the rat hippocampus selectively fire when the animal is at specific locations - place fields - within an environment. Place fields corresponding to such place cells tile the entire environment, forming a stable spatial map supporting navigation and planning. Remarkably, the same place cells reactivate together outside of their place fields and in coincidence with sharp-wave ripples (SWRs) - dominant electrical field oscillations (150-250 Hz) in the hippocampus. These offline SWR events frequently occur during quiet wake periods in the middle of exploration and the follow-up slow-wave sleep and are associated with spatial memory performance and stabilization of spatial maps. Therefore, deciphering the firing patterns during these events is essential to understanding offline memory processing.I provide two novel methods to analyze the SWRs firing patterns in this dissertation project. The first method uses hidden Markov models (HMM), in which I model the dynamics of neural activity during SWRs in terms of transitions between distinct states of neuronal ensemble activity. This method detects consistent temporal structures over many instances of SWRs and, in contrast to standard approaches, relaxes the dependence on positional data during the behavior to interpret temporal patterns during SWRs. To validate this method, I applied the method to quiet wake SWRs. In a simple spatial memory task in which the animal ran on a linear track or in an open arena, the individual states corresponded to the activation of distinct group of neurons with inter-state transitions that resembled the animal’s trajectories during the exploration. In other words, this method enabled us to identify the topology and spatial map of the explored environment by dissecting the firings occurring during the quiescence periods’ SWRs. This result indicated that downstream brain regions may rely only on SWRs to uncover hippocampal code as a substrate for memory processing. I developed a second analysis method based on the principles of Bayesian learning. This method enabled us to track the spatial tunings over the sleep following exploration of an environment by taking neurons’ place fields in the environment as the prior belief and updating it using dynamic ensemble firing patterns unfolding over time. This method introduces a neuronal-ensemble-based approach that calculates tunings to the position encoded by ensemble firings during sleep rather than the animal’s actual position during exploration. When I applied this method to several datasets, I found that during the early slow-wave sleep after an experience, but not during late hours of sleep or sleep before the exploration, the spatial tunings highly resembled the place fields on the track. Furthermore, the fidelity of the spatial tunings to the place fields predicted the place fields’ stability when the animal was re-exposed to the same environment after ~ 9h. Moreover, even for neurons with shifted place fields during re-exposure, the spatial tunings during early sleep were predictive of the place fields during the re-exposure. These results indicated that early sleep actively maintains or retunes the place fields of neurons, explaining the representational drift of place fields across multiple exposures

University of Wisconsin-Milwaukee