Search CORE

4 research outputs found

A New Optimal Stepsize For Approximate Dynamic Programming

Author: Frazier Peter I.
Powell Warren B.
Ryzhov Ilya O.
Publication venue
Publication date: 13/07/2014
Field of study

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.Comment: Matlab files are included with the paper sourc

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Análise da influência da taxa de aprendizado e do fator de desconto sobre o desempenho dos algoritmos Q-learning e SARSA: aplicação do aprendizado por reforço na navegação autônoma

Author: André Luiz Carvalho Ottoni
Erivelton Geraldo Nepomuceno
Lara Toledo Cordeiro
Marcos Santos de Oliveira
Rubisson Duarte Lamperti
Publication venue: 'UPF Editora'
Publication date: 01/01/2016
Field of study

Nos algoritmos de aprendizado por reforço, a taxa de aprendizado (alpha) e o fator de desconto (gamma) podem ser definidos entre qualquer valor no intervalo entre 0 e 1. Assim, adotando os conceitos de regressão logística, é proposta uma metodologia estatística para a análise da influência da variação de \alpha e \gamma nos algoritmos Q-learning e SARSA. Como estudo de caso, o aprendizado por reforço foi aplicado em experimentos de navegação autônoma. A análise de resultados mostrou que simples variações em \alpha e \gamma podem interferir diretamente no desempenho do aprendizado por reforço

Crossref

MURAL - Maynooth University Research Archive Library

Directory of Open Access Journals

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning

Author: Barnes Laura E.
Gerber Matthew S.
Rucker Mark A.
Watson Layne T.
Publication venue
Publication date: 14/06/2021
Field of study

It has been well demonstrated that inverse reinforcement learning (IRL) is an effective technique for teaching machines to perform tasks at human skill levels given human demonstrations (i.e., human to machine apprenticeship learning). This paper seeks to show that a similar application can be demonstrated with human learners. That is, given demonstrations from human experts inverse reinforcement learning techniques can be used to teach other humans to perform at higher skill levels (i.e., human to human apprenticeship learning). To show this two experiments were conducted using a simple, real-time web game where players were asked to touch targets in order to earn as many points as possible. For the experiment player performance was defined as the number of targets a player touched, irrespective of the points that a player actually earned. This allowed for in-game points to be modified and the effect of these alterations on performance measured. At no time were participants told the true performance metric. To determine the point modifications IRL was applied on demonstrations of human experts playing the game. The results of the experiment show with significance that performance improved over the control for select treatment groups. Finally, in addition to the experiment, we also detail the algorithmic challenges we faced when conducting the experiment and the techniques we used to overcome them.Comment: 31 pages, 23 figures, Submitted to Journal of Artificial Intelligence Research, "for source code, see https://github.com/mrucker/kpirl-kla

arXiv.org e-Print Archive