729 research outputs found

    Actor-Critic Reinforcement Learning for Control with Stability Guarantee

    Full text link
    Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, since it is closely related to safety, robustness, and reliability of robotic systems. In this paper, we propose an actor-critic RL framework for control which can guarantee closed-loop stability by employing the classic Lyapunov's method in control theory. First of all, a data-based stability theorem is proposed for stochastic nonlinear systems modeled by Markov decision process. Then we show that the stability condition could be exploited as the critic in the actor-critic RL to learn a controller/policy. At last, the effectiveness of our approach is evaluated on several well-known 3-dimensional robot control tasks and a synthetic biology gene network tracking task in three different popular physics simulation platforms. As an empirical evaluation on the advantage of stability, we show that the learned policies can enable the systems to recover to the equilibrium or way-points when interfered by uncertainties such as system parametric variations and external disturbances to a certain extent.Comment: IEEE RA-L + IROS 202

    A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

    Get PDF
    In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments

    A survey of self organisation in future cellular networks

    Get PDF
    This article surveys the literature over the period of the last decade on the emerging field of self organisation as applied to wireless cellular communication networks. Self organisation has been extensively studied and applied in adhoc networks, wireless sensor networks and autonomic computer networks; however in the context of wireless cellular networks, this is the first attempt to put in perspective the various efforts in form of a tutorial/survey. We provide a comprehensive survey of the existing literature, projects and standards in self organising cellular networks. Additionally, we also aim to present a clear understanding of this active research area, identifying a clear taxonomy and guidelines for design of self organising mechanisms. We compare strength and weakness of existing solutions and highlight the key research areas for further development. This paper serves as a guide and a starting point for anyone willing to delve into research on self organisation in wireless cellular communication networks
    • …
    corecore