39,054 research outputs found
Deep Residual Reinforcement Learning
We revisit residual algorithms in both model-free and model-based
reinforcement learning settings. We propose the bidirectional target network
technique to stabilize residual algorithms, yielding a residual version of DDPG
that significantly outperforms vanilla DDPG in the DeepMind Control Suite
benchmark. Moreover, we find the residual algorithm an effective approach to
the distribution mismatch problem in model-based planning. Compared with the
existing TD() method, our residual-based method makes weaker assumptions
about the model and yields a greater performance boost.Comment: AAMAS 202
Is the Bellman residual a bad proxy?
This paper aims at theoretically and empirically comparing two standard
optimization criteria for Reinforcement Learning: i) maximization of the mean
value and ii) minimization of the Bellman residual. For that purpose, we place
ourselves in the framework of policy search algorithms, that are usually
designed to maximize the mean value, and derive a method that minimizes the
residual over policies. A theoretical analysis
shows how good this proxy is to policy optimization, and notably that it is
better than its value-based counterpart. We also propose experiments on
randomly generated generic Markov decision processes, specifically designed for
studying the influence of the involved concentrability coefficient. They show
that the Bellman residual is generally a bad proxy to policy optimization and
that directly maximizing the mean value is much better, despite the current
lack of deep theoretical analysis. This might seem obvious, as directly
addressing the problem of interest is usually better, but given the prevalence
of (projected) Bellman residual minimization in value-based reinforcement
learning, we believe that this question is worth to be considered.Comment: Final NIPS 2017 version (title, among other things, changed
Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces
Traditional controllers have limitations as they rely on prior knowledge
about the physics of the problem, require modeling of dynamics, and struggle to
adapt to abnormal situations. Deep reinforcement learning has the potential to
address these problems by learning optimal control policies through exploration
in an environment. For safety-critical environments, it is impractical to
explore randomly, and replacing conventional controllers with black-box models
is also undesirable. Also, it is expensive in continuous state and action
spaces, unless the search space is constrained. To address these challenges we
propose a specialized deep residual policy safe reinforcement learning with a
cycle of learning approach adapted for complex and continuous state-action
spaces. Residual policy learning allows learning a hybrid control architecture
where the reinforcement learning agent acts in synchronous collaboration with
the conventional controller. The cycle of learning initiates the policy through
the expert trajectory and guides the exploration around it. Further, the
specialization through the input-output hidden Markov model helps to optimize
policy that lies within the region of interest (such as abnormality), where the
reinforcement learning agent is required and is activated. The proposed
solution is validated on the Tennessee Eastman process control
Physical Deep Reinforcement Learning Towards Safety Guarantee
Deep reinforcement learning (DRL) has achieved tremendous success in many
complex decision-making tasks of autonomous systems with high-dimensional state
and/or action spaces. However, the safety and stability still remain major
concerns that hinder the applications of DRL to safety-critical autonomous
systems. To address the concerns, we proposed the Phy-DRL: a physical deep
reinforcement learning framework. The Phy-DRL is novel in two architectural
designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration
of physics-model-based control and data-driven control). The concurrent
physical reward and residual control empower the Phy-DRL the (mathematically)
provable safety and stability guarantees. Through experiments on the inverted
pendulum, we show that the Phy-DRL features guaranteed safety and stability and
enhanced robustness, while offering remarkably accelerated training and
enlarged reward.Comment: Working Pape
A Layer-Wise Information Reinforcement Approach to Improve Learning in Deep Belief Networks
With the advent of deep learning, the number of works proposing new methods
or improving existent ones has grown exponentially in the last years. In this
scenario, "very deep" models were emerging, once they were expected to extract
more intrinsic and abstract features while supporting a better performance.
However, such models suffer from the gradient vanishing problem, i.e.,
backpropagation values become too close to zero in their shallower layers,
ultimately causing learning to stagnate. Such an issue was overcome in the
context of convolution neural networks by creating "shortcut connections"
between layers, in a so-called deep residual learning framework. Nonetheless, a
very popular deep learning technique called Deep Belief Network still suffers
from gradient vanishing when dealing with discriminative tasks. Therefore, this
paper proposes the Residual Deep Belief Network, which considers the
information reinforcement layer-by-layer to improve the feature extraction and
knowledge retaining, that support better discriminative performance.
Experiments conducted over three public datasets demonstrate its robustness
concerning the task of binary image classification
BlockDrop: Dynamic Inference Paths in Residual Networks
Very deep convolutional neural networks offer excellent recognition results,
yet their computational expense limits their impact for many real-world
applications. We introduce BlockDrop, an approach that learns to dynamically
choose which layers of a deep network to execute during inference so as to best
reduce total computation without degrading prediction accuracy. Exploiting the
robustness of Residual Networks (ResNets) to layer dropping, our framework
selects on-the-fly which residual blocks to evaluate for a given novel image.
In particular, given a pretrained ResNet, we train a policy network in an
associative reinforcement learning setting for the dual reward of utilizing a
minimal number of blocks while preserving recognition accuracy. We conduct
extensive experiments on CIFAR and ImageNet. The results provide strong
quantitative and qualitative evidence that these learned policies not only
accelerate inference but also encode meaningful visual information. Built upon
a ResNet-101 model, our method achieves a speedup of 20\% on average, going as
high as 36\% for some images, while maintaining the same 76.4\% top-1 accuracy
on ImageNet.Comment: CVPR 201
Intelligent video anomaly detection and classification using faster RCNN with deep reinforcement learning model
Recently, intelligent video surveillance applications have become essential in public security by the use of computer vision technologies to investigate and understand long video streams. Anomaly detection and classification are considered a major element of intelligent video surveillance. The aim of anomaly detection is to automatically determine the existence of abnormalities in a short time period. Deep reinforcement learning (DRL) techniques can be employed for anomaly detection, which integrates the concepts of reinforcement learning and deep learning enabling the artificial agents in learning the knowledge and experience from actual data directly. With this motivation, this paper presents an Intelligent Video Anomaly Detection and Classification using Faster RCNN with Deep Reinforcement Learning Model, called IVADC-FDRL model. The presented IVADC-FDRL model operates on two major stages namely anomaly detection and classification. Firstly, Faster RCNN model is applied as an object detector with Residual Network as a baseline model, which detects the anomalies as objects. Besides, deep Q-learning (DQL) based DRL model is employed for the classification of detected anomalies. In order to validate the effective anomaly detection and classification performance of the IVADC-FDRL model, an extensive set of experimentations were carried out on the benchmark UCSD anomaly dataset. The experimental results showcased the better performance of the IVADC-FDRL model over the other compared methods with the maximum accuracy of 98.50% and 94.80% on the applied Test004 and Test007 dataset respectively
Antipodal Robotic Grasping using Deep Learning
In this work, we discuss two implementations that predict antipodal grasps for novel objects: A deep Q-learning approach and a Generative Residual Convolutional Neural Network approach. We present a deep reinforcement learning based method to solve the problem of robotic grasping using visio-motor feedback. The use of a deep learning based approach reduces the complexity caused by the use of hand-designed features. Our method uses an off-policy reinforcement learning framework to learn the grasping policy. We use the double deep Q-learning framework along with a novel Grasp-Q-Network to output grasp probabilities used to learn grasps that maximize the pick success. We propose a visual servoing mechanism that uses a multi-view camera setup that observes the scene which contains the objects of interest. We performed experiments using a Baxter Gazebo simulated environment as well as on the actual robot. The results show that our proposed method outperforms the baseline Q-learning framework and increases grasping accuracy by adapting a multi-view model in comparison to a single-view model. The second method tackles the problem of generating antipodal robotic grasps for unknown objects from an n-channel image of the scene. We propose a novel Generative Residual Convolutional Neural Network (GR-ConvNet) model that can generate robust antipodal grasps from n-channel input at real-time speeds (20ms). We evaluate the proposed model architecture on standard dataset and previously unseen household objects. We achieved state-of-the-art accuracy of 97.7% on Cornell grasp dataset. We also demonstrate a 93.5% grasp success rate on previously unseen real-world objects
- …