Search CORE

3,199 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Density-operator evolution: Complete positivity and the Keldysh real-time expansion

Author: Reimer V.
Wegewijs M. R.
Publication venue: 'Stichting SciPost'
Publication date: 01/01/2019
Field of study

We study the reduced time-evolution of open quantum systems by combining quantum-information and statistical field theory. Inspired by prior work [EPL 102, 60001 (2013) and Phys. Rev. Lett. 111, 050402 (2013)] we establish the explicit structure guaranteeing the complete positivity (CP) and trace-preservation (TP) of the real-time evolution expansion in terms of the microscopic system-environment coupling. This reveals a fundamental two-stage structure of the coupling expansion: Whereas the first stage defines the dissipative timescales of the system --before having integrated out the environment completely-- the second stage sums up elementary physical processes described by CP superoperators. This allows us to establish the nontrivial relation between the (Nakajima-Zwanzig) memory-kernel superoperator for the density operator and novel memory-kernel operators that generate the Kraus operators of an operator-sum. Importantly, this operational approach can be implemented in the existing Keldysh real-time technique and allows approximations for general time-nonlocal quantum master equations to be systematically compared and developed while keeping the CP and TP structure explicit. Our considerations build on the result that a Kraus operator for a physical measurement process on the environment can be obtained by 'cutting' a group of Keldysh real-time diagrams 'in half'. This naturally leads to Kraus operators lifted to the system plus environment which have a diagrammatic expansion in terms of time-nonlocal memory-kernel operators. These lifted Kraus operators obey coupled time-evolution equations which constitute an unraveling of the original Schr\"odinger equation for system plus environment. Whereas both equations lead to the same reduced dynamics, only the former explicitly encodes the operator-sum structure of the coupling expansion.Comment: Submission to SciPost Physics, 49 pages including 6 appendices, 13 figures. Significant improvement of introduction and conclusion, added discussions, fixed typos, no results change

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Macro action selection with deep reinforcement learning in StarCraft

Author: Hu Renjie
Kuang Hongyu
Liu Yang
Sun Huyang
Xu Sijia
Zhuang Zhi
Publication venue
Publication date: 08/10/2019
Field of study

StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also widely accepted as a challenging testbed for AI research because of its enormous state space, partially observed information, multi-agent collaboration, and so on. With the help of annual AIIDE and CIG competitions, a growing number of SC bots are proposed and continuously improved. However, a large gap remains between the top-level bot and the professional human player. One vital reason is that current SC bots mainly rely on predefined rules to select macro actions during their games. These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game. In this paper, we propose a deep reinforcement learning (DRL) framework to improve the selection of macro actions. Our framework is based on the combination of the Ape-X DQN and the Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as LastOrder. Our evaluation, based on training against all bots from the AIIDE 2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning rate, outperforming 26 bots in total 28 entrants

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Model-Based Deep Learning

Author: Dimakis Alexandros G.
Eldar Yonina C.
Shlezinger Nir
Whang Jay
Publication venue
Publication date: 27/06/2021
Field of study

Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. On the other hand, purely data-driven approaches that are model-agnostic are becoming increasingly popular as datasets become abundant and the power of modern deep learning pipelines increases. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance, especially for supervised problems. However, DNNs typically require massive amounts of data and immense computational resources, limiting their applicability for some signal processing scenarios. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches. Such model-based deep learning methods exploit both partial domain knowledge, via mathematical structures designed for specific problems, as well as learning from limited data. In this article we survey the leading approaches for studying and designing model-based deep learning systems. We divide hybrid model-based/data-driven systems into categories based on their inference mechanism. We provide a comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner, along with concrete guidelines and detailed signal processing oriented examples from recent literature. Our aim is to facilitate the design and study of future systems on the intersection of signal processing and machine learning that incorporate the advantages of both domains

arXiv.org e-Print Archive