492 research outputs found
Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives
Policy robustness in Reinforcement Learning (RL) may not be desirable at any
price; the alterations caused by robustness requirements from otherwise optimal
policies should be explainable and quantifiable. Policy gradient algorithms
that have strong convergence guarantees are usually modified to obtain robust
policies in ways that do not preserve algorithm guarantees, which defeats the
purpose of formal robustness requirements. In this work we study a notion of
robustness in partially observable MDPs where state observations are perturbed
by a noise-induced stochastic kernel. We characterise the set of policies that
are maximally robust by analysing how the policies are altered by this kernel.
We then establish a connection between such robust policies and certain
properties of the noise kernel, as well as with structural properties of the
underlying MDPs, constructing sufficient conditions for policy robustness. We
use these notions to propose a robustness-inducing scheme, applicable to any
policy gradient algorithm, to formally trade off the reward achieved by a
policy with its robustness level through lexicographic optimisation, which
preserves convergence properties of the original algorithm. We test the the
proposed approach through numerical experiments on safety-critical RL
environments, and show how the proposed method helps achieve high robustness
when state errors are introduced in the policy roll-out
Event-triggered Consensus Control of Heterogeneous Multi-agent Systems: Model- and Data-based Analysis
This article deals with model- and data-based consensus control of
heterogenous leader-following multi-agent systems (MASs) under an
event-triggering transmission scheme. A dynamic periodic transmission protocol
is developed to significantly alleviate the transmission frequency and
computational burden, where the followers can interact locally with each other
approaching the dynamics of the leader. Capitalizing on a discrete-time
looped-functional, a model-based consensus condition for the closed-loop MASs
is derived in form of linear matrix inequalities (LMIs), as well as a design
method for obtaining the distributed controllers and event-triggering
parameters. Upon collecting noise-corrupted state-input measurements during
open-loop operation, a data-driven leader-following MAS representation is
presented, and employed to solve the data-driven consensus control problem
without requiring any knowledge of the agents' models. This result is then
extended to the case of guaranteeing an performance. A
simulation example is finally given to corroborate the efficacy of the proposed
distributed event-triggering scheme in cutting off data transmissions and the
data-driven design method.Comment: 13 pages, 6 figures. This draft was firstly submitted to IEEE Open
Journal of Control Systems on April 30, 2022, but rejected on June 19, 2022.
Later, on July 23, 2022, this paper was submitted to the journal SCIENCE
CHINA information scienc
Design and Comprehensive Analysis of a Noise-Tolerant ZNN Model With Limited-Time Convergence for Time-Dependent Nonlinear Minimization
Zeroing neural network (ZNN) is a powerful tool to address the mathematical and optimization problems broadly arisen in the science and engineering areas. The convergence and robustness are always co-pursued in ZNN. However, there exists no related work on the ZNN for time-dependent nonlinear minimization that achieves simultaneously limited-time convergence and inherently noise suppression. In this article, for the purpose of satisfying such two requirements, a limited-time robust neural network (LTRNN) is devised and presented to solve time-dependent nonlinear minimization under various external disturbances. Different from the previous ZNN model for this problem either with limited-time convergence or with noise suppression, the proposed LTRNN model simultaneously possesses such two characteristics. Besides, rigorous theoretical analyses are given to prove the superior performance of the LTRNN model when adopted to solve time-dependent nonlinear minimization under external disturbances. Comparative results also substantiate the effectiveness and advantages of LTRNN via solving a time-dependent nonlinear minimization problem
Optimal control approaches for consensus and path planning in multi-agent systems
Optimal control is one of the most powerful, important and advantageous topics in control engineering. The two challenges in every optimal control problem are defining the proper cost function and obtaining the best method to minimize it. In this study, innovative optimal control approaches are developed to solve the two problems of consensus and path planning in multi-agent systems (MASs). The consensus problem for general Linear-Time Invariant systems is solved by implementing an inverse optimal control approach which enables us to start by deriving a control law based on the stability and optimality condition and then according to the derived control define the cost function. We will see that this method in which the cost function is not specified a priori as the conventional optimal control design has the benefit that the resulting control law is guaranteed to be both stabilizing and optimal. Three new theorems in related linear algebra are developed to enable us to use the algorithm for all the general LTI systems. The designed optimal control is distributed and only needs local neighbor-to-neighbor information based on the communication topology to make the agents achieve consensus and track a desired trajectory. Path planning problem is solved for a group are Unmanned Aerial Vehicles (UAVs) that are assigned to track the fronts of a fires in a process of wildfire management. We use Partially Observable Markov Decision Process (POMDP) in order to minimize the cost function that is defined according to the tracking error. Here the challenge is designing the algorithm such that (1) the UAVs are able to make decisions autonomously on which fire front to track and (2) they are able to track the fire fronts which evolve over time in random directions. We will see that by defining proper models, the designed algorithms provides real-time calculation of control variables which enables the UAVs to track the fronts and find their way autonomously. Furthermore, by implementing Nominal Belief-state Optimization (NBO) method, the dynamic constraints of the UAVs is considered and challenges such as collision avoidance is addressed completely in the context of POMDP
- …