39,054 research outputs found

    Deep Residual Reinforcement Learning

    Full text link
    We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD(kk) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.Comment: AAMAS 202

    Is the Bellman residual a bad proxy?

    Get PDF
    This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual ∥T∗vπ−vπ∥1,ν\|T_* v_\pi - v_\pi\|_{1,\nu} over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.Comment: Final NIPS 2017 version (title, among other things, changed

    Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces

    Full text link
    Traditional controllers have limitations as they rely on prior knowledge about the physics of the problem, require modeling of dynamics, and struggle to adapt to abnormal situations. Deep reinforcement learning has the potential to address these problems by learning optimal control policies through exploration in an environment. For safety-critical environments, it is impractical to explore randomly, and replacing conventional controllers with black-box models is also undesirable. Also, it is expensive in continuous state and action spaces, unless the search space is constrained. To address these challenges we propose a specialized deep residual policy safe reinforcement learning with a cycle of learning approach adapted for complex and continuous state-action spaces. Residual policy learning allows learning a hybrid control architecture where the reinforcement learning agent acts in synchronous collaboration with the conventional controller. The cycle of learning initiates the policy through the expert trajectory and guides the exploration around it. Further, the specialization through the input-output hidden Markov model helps to optimize policy that lies within the region of interest (such as abnormality), where the reinforcement learning agent is required and is activated. The proposed solution is validated on the Tennessee Eastman process control

    Physical Deep Reinforcement Learning Towards Safety Guarantee

    Full text link
    Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward.Comment: Working Pape

    A Layer-Wise Information Reinforcement Approach to Improve Learning in Deep Belief Networks

    Full text link
    With the advent of deep learning, the number of works proposing new methods or improving existent ones has grown exponentially in the last years. In this scenario, "very deep" models were emerging, once they were expected to extract more intrinsic and abstract features while supporting a better performance. However, such models suffer from the gradient vanishing problem, i.e., backpropagation values become too close to zero in their shallower layers, ultimately causing learning to stagnate. Such an issue was overcome in the context of convolution neural networks by creating "shortcut connections" between layers, in a so-called deep residual learning framework. Nonetheless, a very popular deep learning technique called Deep Belief Network still suffers from gradient vanishing when dealing with discriminative tasks. Therefore, this paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowledge retaining, that support better discriminative performance. Experiments conducted over three public datasets demonstrate its robustness concerning the task of binary image classification

    BlockDrop: Dynamic Inference Paths in Residual Networks

    Full text link
    Very deep convolutional neural networks offer excellent recognition results, yet their computational expense limits their impact for many real-world applications. We introduce BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy. Exploiting the robustness of Residual Networks (ResNets) to layer dropping, our framework selects on-the-fly which residual blocks to evaluate for a given novel image. In particular, given a pretrained ResNet, we train a policy network in an associative reinforcement learning setting for the dual reward of utilizing a minimal number of blocks while preserving recognition accuracy. We conduct extensive experiments on CIFAR and ImageNet. The results provide strong quantitative and qualitative evidence that these learned policies not only accelerate inference but also encode meaningful visual information. Built upon a ResNet-101 model, our method achieves a speedup of 20\% on average, going as high as 36\% for some images, while maintaining the same 76.4\% top-1 accuracy on ImageNet.Comment: CVPR 201

    Intelligent video anomaly detection and classification using faster RCNN with deep reinforcement learning model

    Get PDF
    Recently, intelligent video surveillance applications have become essential in public security by the use of computer vision technologies to investigate and understand long video streams. Anomaly detection and classification are considered a major element of intelligent video surveillance. The aim of anomaly detection is to automatically determine the existence of abnormalities in a short time period. Deep reinforcement learning (DRL) techniques can be employed for anomaly detection, which integrates the concepts of reinforcement learning and deep learning enabling the artificial agents in learning the knowledge and experience from actual data directly. With this motivation, this paper presents an Intelligent Video Anomaly Detection and Classification using Faster RCNN with Deep Reinforcement Learning Model, called IVADC-FDRL model. The presented IVADC-FDRL model operates on two major stages namely anomaly detection and classification. Firstly, Faster RCNN model is applied as an object detector with Residual Network as a baseline model, which detects the anomalies as objects. Besides, deep Q-learning (DQL) based DRL model is employed for the classification of detected anomalies. In order to validate the effective anomaly detection and classification performance of the IVADC-FDRL model, an extensive set of experimentations were carried out on the benchmark UCSD anomaly dataset. The experimental results showcased the better performance of the IVADC-FDRL model over the other compared methods with the maximum accuracy of 98.50% and 94.80% on the applied Test004 and Test007 dataset respectively

    Antipodal Robotic Grasping using Deep Learning

    Get PDF
    In this work, we discuss two implementations that predict antipodal grasps for novel objects: A deep Q-learning approach and a Generative Residual Convolutional Neural Network approach. We present a deep reinforcement learning based method to solve the problem of robotic grasping using visio-motor feedback. The use of a deep learning based approach reduces the complexity caused by the use of hand-designed features. Our method uses an off-policy reinforcement learning framework to learn the grasping policy. We use the double deep Q-learning framework along with a novel Grasp-Q-Network to output grasp probabilities used to learn grasps that maximize the pick success. We propose a visual servoing mechanism that uses a multi-view camera setup that observes the scene which contains the objects of interest. We performed experiments using a Baxter Gazebo simulated environment as well as on the actual robot. The results show that our proposed method outperforms the baseline Q-learning framework and increases grasping accuracy by adapting a multi-view model in comparison to a single-view model. The second method tackles the problem of generating antipodal robotic grasps for unknown objects from an n-channel image of the scene. We propose a novel Generative Residual Convolutional Neural Network (GR-ConvNet) model that can generate robust antipodal grasps from n-channel input at real-time speeds (20ms). We evaluate the proposed model architecture on standard dataset and previously unseen household objects. We achieved state-of-the-art accuracy of 97.7% on Cornell grasp dataset. We also demonstrate a 93.5% grasp success rate on previously unseen real-world objects
    • …
    corecore