3,754 research outputs found
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
In this paper we introduce the idea of improving the performance of
parametric temporal-difference (TD) learning algorithms by selectively
emphasizing or de-emphasizing their updates on different time steps. In
particular, we show that varying the emphasis of linear TD()'s updates
in a particular way causes its expected update to become stable under
off-policy training. The only prior model-free TD methods to achieve this with
per-step computation linear in the number of function approximation parameters
are the gradient-TD family of methods including TDC, GTD(), and
GQ(). Compared to these methods, our _emphatic TD()_ is
simpler and easier to use; it has only one learned parameter vector and one
step-size parameter. Our treatment includes general state-dependent discounting
and bootstrapping functions, and a way of specifying varying degrees of
interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of
reviews. The most important change was to signal early that the main result
is about stability, not convergenc
Competitive function approximation for reinforcement learning
The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions.
We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
- …