130,516 research outputs found

    ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

    Full text link
    In this paper, we introduce a novel method for enhancing the effectiveness of on-policy Deep Reinforcement Learning (DRL) algorithms. Current on-policy algorithms, such as Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C), do not sufficiently account for cautious interaction with the environment. Our method addresses this gap by explicitly integrating cautious interaction in two critical ways: by maximizing a lower-bound on the true value function plus a constant, thereby promoting a \textit{conservative value estimation}, and by incorporating Thompson sampling for cautious exploration. These features are realized through three surprisingly simple modifications to the A3C algorithm: processing advantage estimates through a ReLU function, spectral normalization, and dropout. We provide theoretical proof that our algorithm maximizes the lower bound, which also grounds Regret Matching Policy Gradients (RMPG), a discrete-action on-policy method for multi-agent reinforcement learning. Our rigorous empirical evaluations across various benchmarks consistently demonstrates our approach's improved performance against existing on-policy algorithms. This research represents a substantial step towards more cautious and effective DRL algorithms, which has the potential to unlock application to complex, real-world problems

    Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations

    Get PDF
    Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep capacity high. This dissertation investigates training a Deep Convolutional Q-learning agent across 20 Atari games, in solving a control task, physics task, and simulating addition, while intentionally reducing Experience Replay capacity from 1×106 to 5×102 . It was found that over 40% in the reduction of Experience Replay size is allowed for 18 of 23 simulations tested, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, a novel method is employed: visualizing Experience Replay via Deep SHAP Explainer. This approach fosters comprehension and transparent, interpretable explanations, though any capacity reduction must be cautious to avoid overfitting. This study demonstrates the feasibility of reducing Experience Replay and advocates for transparent, interpretable decision explanations using the Deep SHAP Explainer to promote enhancing resource efficiency in Experience Replay

    Learning perception and planning with deep active inference

    Get PDF
    Active inference is a process theory of the brain that states that all living organisms infer actions in order to minimize their (expected) free energy. However, current experiments are limited to predefined, often discrete, state spaces. In this paper we use recent advances in deep learning to learn the state space and approximate the necessary probability distributions to engage in active inference

    Learning Negotiating Behavior Between Cars in Intersections using Deep Q-Learning

    Get PDF
    This paper concerns automated vehicles negotiating with other vehicles, typically human driven, in crossings with the goal to find a decision algorithm by learning typical behaviors of other vehicles. The vehicle observes distance and speed of vehicles on the intersecting road and use a policy that adapts its speed along its pre-defined trajectory to pass the crossing efficiently. Deep Q-learning is used on simulated traffic with different predefined driver behaviors and intentions. The results show a policy that is able to cross the intersection avoiding collision with other vehicles 98% of the time, while at the same time not being too passive. Moreover, inferring information over time is important to distinguish between different intentions and is shown by comparing the collision rate between a Deep Recurrent Q-Network at 0.85% and a Deep Q-learning at 1.75%.Comment: 6 pages, 7 figures, Accepted to IEEE International Conference on Intelligent Transportation Systems (ITSC) 201
    • …
    corecore