130,516 research outputs found
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
In this paper, we introduce a novel method for enhancing the effectiveness of
on-policy Deep Reinforcement Learning (DRL) algorithms. Current on-policy
algorithms, such as Proximal Policy Optimization (PPO) and Asynchronous
Advantage Actor-Critic (A3C), do not sufficiently account for cautious
interaction with the environment. Our method addresses this gap by explicitly
integrating cautious interaction in two critical ways: by maximizing a
lower-bound on the true value function plus a constant, thereby promoting a
\textit{conservative value estimation}, and by incorporating Thompson sampling
for cautious exploration. These features are realized through three
surprisingly simple modifications to the A3C algorithm: processing advantage
estimates through a ReLU function, spectral normalization, and dropout. We
provide theoretical proof that our algorithm maximizes the lower bound, which
also grounds Regret Matching Policy Gradients (RMPG), a discrete-action
on-policy method for multi-agent reinforcement learning. Our rigorous empirical
evaluations across various benchmarks consistently demonstrates our approach's
improved performance against existing on-policy algorithms. This research
represents a substantial step towards more cautious and effective DRL
algorithms, which has the potential to unlock application to complex,
real-world problems
Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations
Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep capacity high. This dissertation investigates training a Deep Convolutional Q-learning agent across 20 Atari games, in solving a control task, physics task, and simulating addition, while intentionally reducing Experience Replay capacity from 1×106 to 5×102 . It was found that over 40% in the reduction of Experience Replay size is allowed for 18 of 23 simulations tested, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, a novel method is employed: visualizing Experience Replay via Deep SHAP Explainer. This approach fosters comprehension and transparent, interpretable explanations, though any capacity reduction must be cautious to avoid overfitting. This study demonstrates the feasibility of reducing Experience Replay and advocates for transparent, interpretable decision explanations using the Deep SHAP Explainer to promote enhancing resource efficiency in Experience Replay
Learning perception and planning with deep active inference
Active inference is a process theory of the brain that states that all living organisms infer actions in order to minimize their (expected) free energy. However, current experiments are limited to predefined, often discrete, state spaces. In this paper we use recent advances in deep learning to learn the state space and approximate the necessary probability distributions to engage in active inference
Learning Negotiating Behavior Between Cars in Intersections using Deep Q-Learning
This paper concerns automated vehicles negotiating with other vehicles,
typically human driven, in crossings with the goal to find a decision algorithm
by learning typical behaviors of other vehicles. The vehicle observes distance
and speed of vehicles on the intersecting road and use a policy that adapts its
speed along its pre-defined trajectory to pass the crossing efficiently. Deep
Q-learning is used on simulated traffic with different predefined driver
behaviors and intentions. The results show a policy that is able to cross the
intersection avoiding collision with other vehicles 98% of the time, while at
the same time not being too passive. Moreover, inferring information over time
is important to distinguish between different intentions and is shown by
comparing the collision rate between a Deep Recurrent Q-Network at 0.85% and a
Deep Q-learning at 1.75%.Comment: 6 pages, 7 figures, Accepted to IEEE International Conference on
Intelligent Transportation Systems (ITSC) 201
- …