10,807 research outputs found
Data-Driven Integral Reinforcement Learning for Continuous-Time Non-Zero-Sum Games
This paper develops an integral value iteration (VI) method to efficiently find online the Nash equilibrium solution of two-player non-zero-sum (NZS) differential games for linear systems with partially unknown dynamics. To guarantee the closed-loop stability about the Nash equilibrium, the explicit upper bound for the discounted factor is given. To show the efficacy of the presented online model-free solution, the integral VI method is compared with the model-based off-line policy iteration method. Moreover, the theoretical analysis of the integral VI algorithm in terms of three aspects, i.e., positive definiteness properties of the updated cost functions, the stability of the closed-loop systems, and the conditions that guarantee the monotone convergence, is provided in detail. Finally, the simulation results demonstrate the efficacy of the presented algorithms
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
VIME: Variational Information Maximizing Exploration
Scalable and effective exploration remains a key challenge in reinforcement
learning (RL). While there are methods with optimality guarantees in the
setting of discrete state and action spaces, these methods cannot be applied in
high-dimensional deep RL scenarios. As such, most contemporary RL relies on
simple heuristics such as epsilon-greedy exploration or adding Gaussian noise
to the controls. This paper introduces Variational Information Maximizing
Exploration (VIME), an exploration strategy based on maximization of
information gain about the agent's belief of environment dynamics. We propose a
practical implementation, using variational inference in Bayesian neural
networks which efficiently handles continuous state and action spaces. VIME
modifies the MDP reward function, and can be applied with several different
underlying RL algorithms. We demonstrate that VIME achieves significantly
better performance compared to heuristic exploration methods across a variety
of continuous control tasks and algorithms, including tasks with very sparse
rewards.Comment: Published in Advances in Neural Information Processing Systems 29
(NIPS), pages 1109-111
Stochastic Game Theory: Adjustment to Equilibrium Under Noisy Directional Learning
This paper presents a dynamic model in which agents adjust their decisions in the direction of higher payoffs, subject to random error. This process produces a probability distribution of players' decisions whose evolution over time is determined by the Fokker-Planck equation. The dynamic process is stable for all potential games, a class of payoff structures that includes several widely studied games. In equilibrium, the distributions that determine expected payoffs correspond to the distributions that arise from the logit function applied to those expected payoffs. This "logit equilibrium" forms a stochastic generalization of the Nash equilibrium and provides a possible explanation of anomalous laboratory data.bounded rationality, noisy directional learning, Fokker- Planck equation, potential games, logit equilibrium, stochastic potential.
A survey of random processes with reinforcement
The models surveyed include generalized P\'{o}lya urns, reinforced random
walks, interacting urn models, and continuous reinforced processes. Emphasis is
on methods and results, with sketches provided of some proofs. Applications are
discussed in statistics, biology, economics and a number of other areas.Comment: Published at http://dx.doi.org/10.1214/07-PS094 in the Probability
Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Reinforcement Learning, Intelligent Control and their Applications in Connected and Autonomous Vehicles
Reinforcement learning (RL) has attracted large attention over the past few years. Recently, we developed a data-driven algorithm to solve predictive cruise control (PCC) and games output regulation problems. This work integrates our recent contributions to the application of RL in game theory, output regulation problems, robust control, small-gain theory and PCC. The algorithm was developed for adaptive optimal output regulation of uncertain linear systems, and uncertain partially linear systems to reject disturbance and also force the output of the systems to asymptotically track a reference. In the PCC problem, we determined the reference velocity for each autonomous vehicle in the platoon using the traffic information broadcasted from the lights to reduce the vehicles\u27 trip time. Then we employed the algorithm to design an approximate optimal controller for the vehicles. This controller is able to regulate the headway, velocity and acceleration of each vehicle to the desired values. Simulation results validate the effectiveness of the algorithms
- …