Search CORE

2,437 research outputs found

Policy evaluation with temporal differences: a survey and comparison

Author: Dann C.
Neumann G.
Peters J.
Publication venue: Massachusetts Institute of Technology Press (MIT Press) / Microtome Publishing
Publication date: 01/01/2014
Field of study

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches. This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance

University of Lincoln Institutional Repository

TUbiblio

MPG.PuRe

Recommended from our members

ADAPTIVE STEP-SIZES FOR REINFORCEMENT LEARNING

Author: Dabney William C
Publication venue: ScholarWorks@UMass Amherst
Publication date: 12/11/2014
Field of study

The central theme motivating this dissertation is the desire to develop reinforcement learning algorithms that “just work” regardless of the domain in which they are applied. The largest impediment to this goal is the sensitivity of reinforcement learning algorithms to the step-size parameter used to rescale incremental updates. Adaptive step-size algorithms attempt to reduce this sensitivity or eliminate the step-size parameter entirely by automatically adjusting the step size throughout the learning process. Such algorithms provide an alternative to the standard “guess-and-check” methods used to find parameters known as parameter tuning. However, the problems with parameter tuning are currently masked by the way experiments are conducted and presented. In this dissertation we seek algorithms that perform well over a broad subset of reinforcement learning problems with minimal parameter tuning. To accomplish this we begin by addressing the limitations of current empirical methods in reinforcement learning and propose improvements with benefits far outside the area of adaptive step-sizes. In order to study adaptive step-sizes in reinforcement learning we show that the general form of the adaptive step-size problem is a combination of two dissociable problems (adaptive scalar step-size and update whitening). We then derive new parameter-free adaptive scalar step-size algorithms for the reinforcement learning algorithm Sarsa(λ) and use our improved empirical methods to conduct a thorough experimental study of step-size algorithms in reinforcement learning. Our adaptive algorithms (VES and PARL2) both eliminate the need for a tunable step-size parameter and perform at least as well as Sarsa(λ) with an optimized step-size value. We conclude by developing natural temporal difference algorithms that provide an approximate solution to the update whitening problem and improve performance over their non-natural counterparts

ScholarWorks@UMass Amherst

Myoelectric Knee Angle Estimation Algorithms for Control of Active Transfemoral Leg Prostheses

Author: Adson F. da Rocha
Alberto L. Delis
Francisco A. O. Nascimento
Geovany A. Borges
Joao L. A. Carvalho
Publication venue: 'IntechOpen'
Publication date: 21/01/2011
Field of study

IntechOpen

Ensemble Reinforcement Learning: A Survey

Author: Chen Yingwu
He Yongming
Ou Junwei
Pedrycz Witold
Song Yanjie
Suganthan P. N.
Wu Yutong
Publication venue
Publication date: 19/04/2023
Field of study

Reinforcement Learning (RL) has emerged as a highly effective technique for addressing various scientific and applied problems. Despite its success, certain complex tasks remain challenging to be addressed solely with a single model and algorithm. In response, ensemble reinforcement learning (ERL), a promising approach that combines the benefits of both RL and ensemble learning (EL), has gained widespread popularity. ERL leverages multiple models or training algorithms to comprehensively explore the problem space and possesses strong generalization capabilities. In this study, we present a comprehensive survey on ERL to provide readers with an overview of recent advances and challenges in the field. First, we introduce the background and motivation for ERL. Second, we analyze in detail the strategies that have been successfully applied in ERL, including model averaging, model selection, and model combination. Subsequently, we summarize the datasets and analyze algorithms used in relevant studies. Finally, we outline several open questions and discuss future research directions of ERL. By providing a guide for future scientific research and engineering applications, this survey contributes to the advancement of ERL.Comment: 42 page

arXiv.org e-Print Archive

Adaptive Railway Traffic Control using Approximate Dynamic Programming

Author: Ghasempournejad Seifdokht Taha
Publication venue: UCL (University College London)
Publication date: 28/12/2019
Field of study

Railway networks around the world have become challenging to operate in recent decades, with a mixture of track layouts running several different classes of trains with varying operational speeds. This complexity has come about as a result of the sustained increase in passenger numbers where in many countries railways are now more popular than ever before as means of commuting to cities. To address operational challenges, governments and railway undertakings are encouraging development of intelligent and digital transport systems to regulate and optimise train operations in real-time to increase capacity and customer satisfaction by improved usage of existing railway infrastructure. Accordingly, this thesis presents an adaptive railway traffic control system for realtime operations based on a data-based approximate dynamic programming (ADP) approach with integrated reinforcement learning (RL). By assessing requirements and opportunities, the controller aims to reduce delays resulting from trains that entered a control area behind schedule by re-scheduling control plans in real-time at critical locations in a timely manner. The present data-based approach depends on an approximation to the value function of dynamic programming after optimisation from a specified state, which is estimated dynamically from operational experience using RL techniques. By using this approximation, ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this thesis, formulations of the approximation function and variants of the RL learning techniques used to estimate it are explored. Evaluation of this controller shows considerable improvements in delays by comparison with current industry practices

UCL Discovery

Kernel Adaptive Filtering Approaches for Financial Time-Series Prediction

Author: Garcia-Vega Sergio
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository