4 research outputs found

    Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization

    Full text link
    In this paper, a novel Multi-agent Reinforcement Learning (MARL) approach, Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents. It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Centralized Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and competition tasks and traditional control tasks including OpenAI benchmarks and robot arm manipulation, MACDPP demonstrates significant superiority in learning capability and sample efficiency compared with both related multi-agent and widely implemented signal-agent baselines and therefore expands the potential of MARL in effectively learning challenging control scenarios

    Practical Probabilistic Model-based Deep Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling

    Full text link
    This paper addresses the prediction stability, prediction accuracy and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks. A novel approach dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed where the system uncertainty is stably predicted by combining the Monte-Carlo dropout and trajectory sampling in one framework. Its loss function is designed to correct the fitting error of neural networks for more accurate prediction of probabilistic models. The state propagation in its policy is extended to filter the aleatoric uncertainty for superior control capability. Evaluated by several Mujoco benchmark control tasks under additional disturbances and one practical robot arm manipulation task, DPETS outperforms related MBRL approaches in both average return and convergence velocity while achieving superior performance than well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available at https://github.com/mrjun123/DPETS

    Efficient Uncertainty Propagation in Model-Based Reinforcement Learning Unmanned Surface Vehicle Using Unscented Kalman Filter

    No full text
    This article tackles the computational burden of propagating uncertainties in the model predictive controller-based policy of the probabilistic model-based reinforcement learning (MBRL) system for an unmanned surface vehicles system (USV). We proposed filtered probabilistic model predictive control using the unscented Kalman filter (FPMPC-UKF) that introduces the unscented Kalman filter (UKF) for a more efficient uncertainty propagation in MBRL. A USV control system based on FPMPC-UKF is developed and evaluated by position-keeping and target-reaching tasks in a real USV data-driven simulation. The experimental results demonstrate a significant superiority of the proposed method in balancing the control performance and computational burdens under different levels of disturbances compared with the related works of USV, and therefore indicate its potential in more challenging USV scenarios with limited computational resources
    corecore