258 research outputs found
Resilient Autonomous Control of Distributed Multi-agent Systems in Contested Environments
An autonomous and resilient controller is proposed for leader-follower
multi-agent systems under uncertainties and cyber-physical attacks. The leader
is assumed non-autonomous with a nonzero control input, which allows changing
the team behavior or mission in response to environmental changes. A resilient
learning-based control protocol is presented to find optimal solutions to the
synchronization problem in the presence of attacks and system dynamic
uncertainties. An observer-based distributed H_infinity controller is first
designed to prevent propagating the effects of attacks on sensors and actuators
throughout the network, as well as to attenuate the effect of these attacks on
the compromised agent itself. Non-homogeneous game algebraic Riccati equations
are derived to solve the H_infinity optimal synchronization problem and
off-policy reinforcement learning is utilized to learn their solution without
requiring any knowledge of the agent's dynamics. A trust-confidence based
distributed control protocol is then proposed to mitigate attacks that hijack
the entire node and attacks on communication links. A confidence value is
defined for each agent based solely on its local evidence. The proposed
resilient reinforcement learning algorithm employs the confidence value of each
agent to indicate the trustworthiness of its own information and broadcast it
to its neighbors to put weights on the data they receive from it during and
after learning. If the confidence value of an agent is low, it employs a trust
mechanism to identify compromised agents and remove the data it receives from
them from the learning process. Simulation results are provided to show the
effectiveness of the proposed approach
Event-triggered robust control for multi-player nonzero-sum games with input constraints and mismatched uncertainties
In this article, an event-triggered robust control (ETRC) method is investigated for multi-player nonzero-sum games of continuous-time input constrained nonlinear systems with mismatched uncertainties. By constructing an auxiliary system and designing an appropriate value function, the robust control problem of input constrained nonlinear systems is transformed into an optimal regulation problem. Then, a critic neural network (NN) is adopted to approximate the value function of each player for solving the event-triggered coupled Hamilton-Jacobi equation and obtaining control laws. Based on a designed event-triggering condition, control laws are updated when events occur only. Thus, both computational burden and communication bandwidth are reduced. We prove that the weight approximation errors of critic NNs and the closed-loop uncertain multi-player system states are all uniformly ultimately bounded thanks to the Lyapunov's direct method. Finally, two examples are provided to demonstrate the effectiveness of the developed ETRC method
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Construction of Barrier in a Fishing Game With Point Capture
This paper addresses a particular pursuit-evasion game, called as “fishing game” where a faster evader attempts to pass the gap between two pursuers. We are concerned with the conditions under which the evader or pursuers can win the game. This is a game of kind in which an essential aspect, barrier, separates the state space into disjoint parts associated with each player's winning region. We present a method of explicit policy to construct the barrier. This method divides the fishing game into two subgames related to the included angle and the relative distances between the evader and the pursuers, respectively, and then analyzes the possibility of capture or escape for each subgame to ascertain the analytical forms of the barrier. Furthermore, we fuse the games of kind and degree by solving the optimal control strategies in the minimum time for each player when the initial state lies in their winning regions. Along with the optimal strategies, the trajectories of the players are delineated and the upper bounds of their winning times are also derived
Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States
Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nev-ertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms
- …