492 research outputs found

    Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives

    Full text link
    Policy robustness in Reinforcement Learning (RL) may not be desirable at any price; the alterations caused by robustness requirements from otherwise optimal policies should be explainable and quantifiable. Policy gradient algorithms that have strong convergence guarantees are usually modified to obtain robust policies in ways that do not preserve algorithm guarantees, which defeats the purpose of formal robustness requirements. In this work we study a notion of robustness in partially observable MDPs where state observations are perturbed by a noise-induced stochastic kernel. We characterise the set of policies that are maximally robust by analysing how the policies are altered by this kernel. We then establish a connection between such robust policies and certain properties of the noise kernel, as well as with structural properties of the underlying MDPs, constructing sufficient conditions for policy robustness. We use these notions to propose a robustness-inducing scheme, applicable to any policy gradient algorithm, to formally trade off the reward achieved by a policy with its robustness level through lexicographic optimisation, which preserves convergence properties of the original algorithm. We test the the proposed approach through numerical experiments on safety-critical RL environments, and show how the proposed method helps achieve high robustness when state errors are introduced in the policy roll-out

    Event-triggered Consensus Control of Heterogeneous Multi-agent Systems: Model- and Data-based Analysis

    Full text link
    This article deals with model- and data-based consensus control of heterogenous leader-following multi-agent systems (MASs) under an event-triggering transmission scheme. A dynamic periodic transmission protocol is developed to significantly alleviate the transmission frequency and computational burden, where the followers can interact locally with each other approaching the dynamics of the leader. Capitalizing on a discrete-time looped-functional, a model-based consensus condition for the closed-loop MASs is derived in form of linear matrix inequalities (LMIs), as well as a design method for obtaining the distributed controllers and event-triggering parameters. Upon collecting noise-corrupted state-input measurements during open-loop operation, a data-driven leader-following MAS representation is presented, and employed to solve the data-driven consensus control problem without requiring any knowledge of the agents' models. This result is then extended to the case of guaranteeing an H∞\mathcal{H}_{\infty} performance. A simulation example is finally given to corroborate the efficacy of the proposed distributed event-triggering scheme in cutting off data transmissions and the data-driven design method.Comment: 13 pages, 6 figures. This draft was firstly submitted to IEEE Open Journal of Control Systems on April 30, 2022, but rejected on June 19, 2022. Later, on July 23, 2022, this paper was submitted to the journal SCIENCE CHINA information scienc

    Design and Comprehensive Analysis of a Noise-Tolerant ZNN Model With Limited-Time Convergence for Time-Dependent Nonlinear Minimization

    Get PDF
    Zeroing neural network (ZNN) is a powerful tool to address the mathematical and optimization problems broadly arisen in the science and engineering areas. The convergence and robustness are always co-pursued in ZNN. However, there exists no related work on the ZNN for time-dependent nonlinear minimization that achieves simultaneously limited-time convergence and inherently noise suppression. In this article, for the purpose of satisfying such two requirements, a limited-time robust neural network (LTRNN) is devised and presented to solve time-dependent nonlinear minimization under various external disturbances. Different from the previous ZNN model for this problem either with limited-time convergence or with noise suppression, the proposed LTRNN model simultaneously possesses such two characteristics. Besides, rigorous theoretical analyses are given to prove the superior performance of the LTRNN model when adopted to solve time-dependent nonlinear minimization under external disturbances. Comparative results also substantiate the effectiveness and advantages of LTRNN via solving a time-dependent nonlinear minimization problem

    Optimal control approaches for consensus and path planning in multi-agent systems

    Get PDF
    Optimal control is one of the most powerful, important and advantageous topics in control engineering. The two challenges in every optimal control problem are defining the proper cost function and obtaining the best method to minimize it. In this study, innovative optimal control approaches are developed to solve the two problems of consensus and path planning in multi-agent systems (MASs). The consensus problem for general Linear-Time Invariant systems is solved by implementing an inverse optimal control approach which enables us to start by deriving a control law based on the stability and optimality condition and then according to the derived control define the cost function. We will see that this method in which the cost function is not specified a priori as the conventional optimal control design has the benefit that the resulting control law is guaranteed to be both stabilizing and optimal. Three new theorems in related linear algebra are developed to enable us to use the algorithm for all the general LTI systems. The designed optimal control is distributed and only needs local neighbor-to-neighbor information based on the communication topology to make the agents achieve consensus and track a desired trajectory. Path planning problem is solved for a group are Unmanned Aerial Vehicles (UAVs) that are assigned to track the fronts of a fires in a process of wildfire management. We use Partially Observable Markov Decision Process (POMDP) in order to minimize the cost function that is defined according to the tracking error. Here the challenge is designing the algorithm such that (1) the UAVs are able to make decisions autonomously on which fire front to track and (2) they are able to track the fire fronts which evolve over time in random directions. We will see that by defining proper models, the designed algorithms provides real-time calculation of control variables which enables the UAVs to track the fronts and find their way autonomously. Furthermore, by implementing Nominal Belief-state Optimization (NBO) method, the dynamic constraints of the UAVs is considered and challenges such as collision avoidance is addressed completely in the context of POMDP
    • …
    corecore