19,799 research outputs found
Deep Reinforcement Learning with Weighted Q-Learning
Overestimation of the maximum action-value is a well-known problem that
hinders Q-Learning performance, leading to suboptimal policies and unstable
learning. Among several Q-Learning variants proposed to address this issue,
Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable
results in stochastic environments. WQL uses a weighted sum of the estimated
action-values, where the weights correspond to the probability of each
action-value being the maximum; however, the computation of these probabilities
is only practical in the tabular settings. In this work, we provide the
methodological advances to benefit from the WQL properties in Deep
Reinforcement Learning (DRL), by using neural networks with Dropout Variational
Inference as an effective approximation of deep Gaussian processes. In
particular, we adopt the Concrete Dropout variant to obtain calibrated
estimates of epistemic uncertainty in DRL. We show that model uncertainty in
DRL can be useful not only for action selection, but also action evaluation. We
analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias
w.r.t. relevant baselines and provide empirical evidence of its advantages on
several representative benchmarks.Comment: Corrected typo
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
Reinforcement learning is an important branch of machine learning.With the development of deep learning,deep reinforcement learning research has gradually developed into the focus of reinforcement learning research.Model-free off-policy deep reinforcement learning algorithms for continuous control attract everyone’s attention because of their strong practicality.Like Q-learning,algorithms based on actor-critic suffer from the problem of overestimations.To a certain extent,clipped double Q-lear-ning method solves the effect of the overestimation in actor-critic algorithms,but it also introduces underestimation to the lear-ning process.In order to further solve the problems of overestimation and underestimation in the actor-critic algorithms,a new learning method,randomly weighted triple Q-learning method is proposed.In addition,combining the new method with the soft actor critic algorithm,a new soft actor critic algorithm based on randomly weighted triple Q-learning is proposed.This algorithm not only limits the Q estimation value near the real Q value,but also increases the randomness of the Q estimation value through randomly weighted method,so as to solve the problems of overestimation and underestimation of action value in the learning process.Experiment results show that,compared to the SAC algorithm and other currently popular deep reinforcement learning algorithms such as DDPG,PPO and TD3,the SAC-RWTQ algorithm has better performance on several Mujoco tasks on the gym simulation platform
Deep Reinforcement Learning for Smart Queue Management
With the goal of meeting the stringent throughput and delay requirements of classified network flows, we propose a Deep Q-learning Network (DQN) for optimal weight selection in an active queue management system based on Weighted Fair Queuing (WFQ). Our system schedules flows belonging to different priority classes (Gold, Silver, and Bronze) into separate queues, and learns how and when to dequeue from each queue. The neural network implements deep reinforcement learning tools such as target networks and replay buffers to help learn the best weights depending on the network state. We show, via simulations, that our algorithm converges to an efficient model capable of adapting to the flow demands, producing thus lower delays with respect to traditional WFQ
Toward a Brain-Inspired System: Deep Recurrent Reinforcement Learning for a Simulated Self-Driving Agent
An effective way to achieve intelligence is to simulate various intelligent behaviors in the human brain. In recent years, bio-inspired learning methods have emerged, and they are different from the classical mathematical programming principle. From the perspective of brain inspiration, reinforcement learning has gained additional interest in solving decision-making tasks as increasing neuroscientific research demonstrates that significant links exist between reinforcement learning and specific neural substrates. Because of the tremendous research that focuses on human brains and reinforcement learning, scientists have investigated how robots can autonomously tackle complex tasks in the form of making a self-driving agent control in a human-like way. In this study, we propose an end-to-end architecture using novel deep-Q-network architecture in conjunction with a recurrence to resolve the problem in the field of simulated self-driving. The main contribution of this study is that we trained the driving agent using a brain-inspired trial-and-error technique, which was in line with the real world situation. Besides, there are three innovations in the proposed learning network: raw screen outputs are the only information which the driving agent can rely on, a weighted layer that enhances the differences of the lengthy episode, and a modified replay mechanism that overcomes the problem of sparsity and accelerates learning. The proposed network was trained and tested under a third-party OpenAI Gym environment. After training for several episodes, the resulting driving agent performed advanced behaviors in the given scene. We hope that in the future, the proposed brain-inspired learning system would inspire practicable self-driving control solutions
- …