Search CORE

4 research outputs found

Reinforcement learning in continuous state and action spaces

Author: Hasselt H. P. (Hado) van
Publication venue: Springer Berlin Heidelberg
Publication date: 01/04/2012
Field of study

Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

CWI's Institutional Repository

Double Q-learning

Author: Hasselt H. P. (Hado) van
Publication venue: The MIT Press
Publication date: 01/12/2010
Field of study

In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values, which result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way to approximate the maximum expected value for any set of random variables. The obtained double estimator method is shown to sometimes underestimate rather than overestimate the maximum expected value. We apply the double estimator to Q-learning to construct Double Q-learning, a new off-policy reinforcement learning algorithm. We show the new algorithm converges to the optimal policy and that it performs well in some settings in which Q-learning performs poorly due to its overestimation

CWI's Institutional Repository

Stacking Under Uncertainty: We Know How To Predict, But How Should We Act?

Author: Hasselt H. P. (Hado) van
Poutré J.A. (Han) La
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Crossref

CWI's Institutional Repository

Reinforcement Learning Algorithms for solving Classification Problems

Author: Hasselt H. P. (Hado) van
Pietersma A.D.
Schomaker L. R. B.
Wiering M.A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2011
Field of study

CWI's Institutional Repository