199 research outputs found

    Deep Reinforcement Learning: A Brief Survey

    Get PDF
    Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higher-level understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field

    Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

    Full text link
    Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

    Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

    Get PDF
    Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more information, see https://vggoecks.com

    Learning predictive cognitive maps with spiking neurons during behaviour and replays

    Get PDF
    The hippocampus has been proposed to encode environments using a representation that contains predictive information about likely future states, called the successor representation. However, it is not clear how such a representation could be learned in the hippocampal circuit. Here, we propose a plasticity rule that can learn this predictive map of the environment using a spiking neural network. We connect this biologically plausible plasticity rule to reinforcement learning, mathematically and numerically showing that it implements the TD-lambda algorithm. By spanning these different levels, we show how our framework naturally encompasses behavioral activity and replays, smoothly moving from rate to temporal coding, and allows learning over behavioral timescales with a plasticity rule acting on a timescale of milliseconds. We discuss how biological parameters such as dwelling times at states, neuronal firing rates and neuromodulation relate to the delay discounting parameter of the TD algorithm, and how they influence the learned representation. We also find that, in agreement with psychological studies and contrary to reinforcement learning theory, the discount factor decreases hyperbolically with time. Finally, our framework suggests a role for replays, in both aiding learning in novel environments and finding shortcut trajectories that were not experienced during behavior, in agreement with experimental data

    Deep Reinforcement Learning based Path-Planning for Multi-Agent Systems in Advection-Diffusion Field Reconstruction Tasks

    Get PDF
    Many environmental processes can be represented mathematically using spatial-temporal varying partial-differential equations. Timely estimation and prediction of processes such as wildfires is critical for disaster management response, but is difficult to accomplish without the availability of a dense network of stationary sensors. In this work, we propose a deep reinforcement learning-based real-time path-planning algorithm for mobile sensor networks traveling in a formation through a spatial-temporal varying advection-diffusion field for the task of field reconstruction. A deep Q-network (DQN) agent is trained on simulated advection-diffusion fields to direct the mobile sensor network to travel along information-rich trajectories. The field measurements made by the mobile sensor network along their trajectories enable identification of field advection parameters, which are required for field reconstruction. A cooperative Kalman filter developed in previous works is employed to receive estimates of the field values and gradients, which are essential for reconstruction as well as for the estimation of the diffusion parameter. A mechanism is provided that encourages exploration in the field domain once a stationary state is reached, which allows the algorithm to identify other information-rich trajectories that may exist in the field improving reconstruction performance significantly. Two simulation environments of different fidelities are provided to test the feasibility of the proposed algorithm. The low-fidelity simulation environment is used for training of the DQN agent. The high-fidelity simulation environment is based on Robot Operating System (ROS) and simulates real robots. We provide results of running sample test episodes in both environments which demonstrate the effectiveness and feasibility of the proposed algorithm

    Deciphering the Firing Patterns of Hippocampal Neurons During Sharp-Wave Ripples

    Get PDF
    The hippocampus is essential for learning and memory. Neurons in the rat hippocampus selectively fire when the animal is at specific locations - place fields - within an environment. Place fields corresponding to such place cells tile the entire environment, forming a stable spatial map supporting navigation and planning. Remarkably, the same place cells reactivate together outside of their place fields and in coincidence with sharp-wave ripples (SWRs) - dominant electrical field oscillations (150-250 Hz) in the hippocampus. These offline SWR events frequently occur during quiet wake periods in the middle of exploration and the follow-up slow-wave sleep and are associated with spatial memory performance and stabilization of spatial maps. Therefore, deciphering the firing patterns during these events is essential to understanding offline memory processing.I provide two novel methods to analyze the SWRs firing patterns in this dissertation project. The first method uses hidden Markov models (HMM), in which I model the dynamics of neural activity during SWRs in terms of transitions between distinct states of neuronal ensemble activity. This method detects consistent temporal structures over many instances of SWRs and, in contrast to standard approaches, relaxes the dependence on positional data during the behavior to interpret temporal patterns during SWRs. To validate this method, I applied the method to quiet wake SWRs. In a simple spatial memory task in which the animal ran on a linear track or in an open arena, the individual states corresponded to the activation of distinct group of neurons with inter-state transitions that resembled the animal’s trajectories during the exploration. In other words, this method enabled us to identify the topology and spatial map of the explored environment by dissecting the firings occurring during the quiescence periods’ SWRs. This result indicated that downstream brain regions may rely only on SWRs to uncover hippocampal code as a substrate for memory processing. I developed a second analysis method based on the principles of Bayesian learning. This method enabled us to track the spatial tunings over the sleep following exploration of an environment by taking neurons’ place fields in the environment as the prior belief and updating it using dynamic ensemble firing patterns unfolding over time. This method introduces a neuronal-ensemble-based approach that calculates tunings to the position encoded by ensemble firings during sleep rather than the animal’s actual position during exploration. When I applied this method to several datasets, I found that during the early slow-wave sleep after an experience, but not during late hours of sleep or sleep before the exploration, the spatial tunings highly resembled the place fields on the track. Furthermore, the fidelity of the spatial tunings to the place fields predicted the place fields’ stability when the animal was re-exposed to the same environment after ~ 9h. Moreover, even for neurons with shifted place fields during re-exposure, the spatial tunings during early sleep were predictive of the place fields during the re-exposure. These results indicated that early sleep actively maintains or retunes the place fields of neurons, explaining the representational drift of place fields across multiple exposures
    • …
    corecore