3,199 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Density-operator evolution: Complete positivity and the Keldysh real-time expansion
We study the reduced time-evolution of open quantum systems by combining
quantum-information and statistical field theory. Inspired by prior work [EPL
102, 60001 (2013) and Phys. Rev. Lett. 111, 050402 (2013)] we establish the
explicit structure guaranteeing the complete positivity (CP) and
trace-preservation (TP) of the real-time evolution expansion in terms of the
microscopic system-environment coupling.
This reveals a fundamental two-stage structure of the coupling expansion:
Whereas the first stage defines the dissipative timescales of the system
--before having integrated out the environment completely-- the second stage
sums up elementary physical processes described by CP superoperators. This
allows us to establish the nontrivial relation between the (Nakajima-Zwanzig)
memory-kernel superoperator for the density operator and novel memory-kernel
operators that generate the Kraus operators of an operator-sum. Importantly,
this operational approach can be implemented in the existing Keldysh real-time
technique and allows approximations for general time-nonlocal quantum master
equations to be systematically compared and developed while keeping the CP and
TP structure explicit.
Our considerations build on the result that a Kraus operator for a physical
measurement process on the environment can be obtained by 'cutting' a group of
Keldysh real-time diagrams 'in half'. This naturally leads to Kraus operators
lifted to the system plus environment which have a diagrammatic expansion in
terms of time-nonlocal memory-kernel operators. These lifted Kraus operators
obey coupled time-evolution equations which constitute an unraveling of the
original Schr\"odinger equation for system plus environment. Whereas both
equations lead to the same reduced dynamics, only the former explicitly encodes
the operator-sum structure of the coupling expansion.Comment: Submission to SciPost Physics, 49 pages including 6 appendices, 13
figures. Significant improvement of introduction and conclusion, added
discussions, fixed typos, no results change
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
Model-Based Deep Learning
Signal processing, communications, and control have traditionally relied on
classical statistical modeling techniques. Such model-based methods utilize
mathematical formulations that represent the underlying physics, prior
information and additional domain knowledge. Simple classical models are useful
but sensitive to inaccuracies and may lead to poor performance when real
systems display complex or dynamic behavior. On the other hand, purely
data-driven approaches that are model-agnostic are becoming increasingly
popular as datasets become abundant and the power of modern deep learning
pipelines increases. Deep neural networks (DNNs) use generic architectures
which learn to operate from data, and demonstrate excellent performance,
especially for supervised problems. However, DNNs typically require massive
amounts of data and immense computational resources, limiting their
applicability for some signal processing scenarios. We are interested in hybrid
techniques that combine principled mathematical models with data-driven systems
to benefit from the advantages of both approaches. Such model-based deep
learning methods exploit both partial domain knowledge, via mathematical
structures designed for specific problems, as well as learning from limited
data. In this article we survey the leading approaches for studying and
designing model-based deep learning systems. We divide hybrid
model-based/data-driven systems into categories based on their inference
mechanism. We provide a comprehensive review of the leading approaches for
combining model-based algorithms with deep learning in a systematic manner,
along with concrete guidelines and detailed signal processing oriented examples
from recent literature. Our aim is to facilitate the design and study of future
systems on the intersection of signal processing and machine learning that
incorporate the advantages of both domains
- …