371,588 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Stochastic Reinforcement Learning
In reinforcement learning episodes, the rewards and punishments are often
non-deterministic, and there are invariably stochastic elements governing the
underlying situation. Such stochastic elements are often numerous and cannot be
known in advance, and they have a tendency to obscure the underlying rewards
and punishments patterns. Indeed, if stochastic elements were absent, the same
outcome would occur every time and the learning problems involved could be
greatly simplified. In addition, in most practical situations, the cost of an
observation to receive either a reward or punishment can be significant, and
one would wish to arrive at the correct learning conclusion by incurring
minimum cost. In this paper, we present a stochastic approach to reinforcement
learning which explicitly models the variability present in the learning
environment and the cost of observation. Criteria and rules for learning
success are quantitatively analyzed, and probabilities of exceeding the
observation cost bounds are also obtained.Comment: AIKE 201
- …