Search CORE

3,373 research outputs found

Quantum inspired algorithms for learning and control of stochastic systems

Author: Rajagopal Karthikeyan
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2015
Field of study

Motivated by the limitations of the current reinforcement learning and optimal control techniques, this dissertation proposes quantum theory inspired algorithms for learning and control of both single-agent and multi-agent stochastic systems. A common problem encountered in traditional reinforcement learning techniques is the exploration-exploitation trade-off. To address the above issue an action selection procedure inspired by a quantum search algorithm called Grover\u27s iteration is developed. This procedure does not require an explicit design parameter to specify the relative frequency of explorative/exploitative actions. The second part of this dissertation extends the powerful adaptive critic design methodology to solve finite horizon stochastic optimal control problems. To numerically solve the stochastic Hamilton Jacobi Bellman equation, which characterizes the optimal expected cost function, large number of trajectory samples are required. The proposed methodology overcomes the above difficulty by using the path integral control formulation to adaptively sample trajectories of importance. The third part of this dissertation presents two quantum inspired coordination models to dynamically assign targets to agents operating in a stochastic environment. The first approach uses a quantum decision theory model that explains irrational action choices in human decision making. The second approach uses a quantum game theory model that exploits the quantum mechanical phenomena \u27entanglement\u27 to increase individual pay-off in multi-player games. The efficiency and scalability of the proposed coordination models are demonstrated through simulations of a large scale multi-agent system --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Online Gaming: Real Time Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration

Author: Frank L. Lewis
Kyriakos G. Vamvoudakis
Publication venue: 'IntechOpen'
Publication date: 14/01/2011
Field of study

IntechOpen

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Deep Q-Learning for Nash Equilibria: Nash-DQN

Author: Casgrain Philippe
Jaimungal Sebastian
Ning Brian
Publication venue
Publication date: 23/04/2019
Field of study

Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.Comment: 16 pages, 4 figure

arXiv.org e-Print Archive

Advances in Reinforcement Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

Directory of Open Access Books (DOAB)

Putting artificial intelligence into wearable human-machine interfaces – towards a generic, self-improving controller

Author: Admiraal Marcel
Publication venue: Mechanical Engineering, Imperial College London
Publication date: 01/04/2020
Field of study

The standard approach to creating a machine learning based controller is to provide users with a number of gestures that they need to make; record multiple instances of each gesture using specific sensors; extract the relevant sensor data and pass it through a supervised learning algorithm until the algorithm can successfully identify the gestures; map each gesture to a control signal that performs a desired outcome. This approach is both inflexible and time consuming. The primary contribution of this research was to investigate a new approach to putting artificial intelligence into wearable human-machine interfaces by creating a Generic, Self-Improving Controller. It was shown to learn two user-defined static gestures with an accuracy of 100% in less than 10 samples per gesture; three in less than 20 samples per gesture; and four in less than 35 samples per gesture. Pre-defined dynamic gestures were more difficult to learn. It learnt two with an accuracy of 90% in less than 6,000 samples per gesture; and four with an accuracy of 70% after 50,000 samples per gesture. The research has resulted in a number of additional contributions: • The creation of a source-independent hardware data capture, processing, fusion and storage tool for standardising the capture and storage of historical copies of data captured from multiple different sensors. • An improved Attitude and Heading Reference System (AHRS) algorithm for calculating orientation quaternions that is five orders of magnitude more precise. • The reformulation of the regularised TD learning algorithm; the reformulation of the TD learning algorithm applied the artificial neural network back-propagation algorithm; and the combination of the reformulations into a new, regularised TD learning algorithm applied to the artificial neural network back-propagation algorithm. • The creation of a Generic, Self-Improving Predictor that can use different learning algorithms and a Flexible Artificial Neural Network.Open Acces

Spiral - Imperial College Digital Repository