4 research outputs found
A New Optimal Stepsize For Approximate Dynamic Programming
Approximate dynamic programming (ADP) has proven itself in a wide range of
applications spanning large-scale transportation problems, health care, revenue
management, and energy systems. The design of effective ADP algorithms has many
dimensions, but one crucial factor is the stepsize rule used to update a value
function approximation. Many operations research applications are
computationally intensive, and it is important to obtain good results quickly.
Furthermore, the most popular stepsize formulas use tunable parameters and can
produce very poor results if tuned improperly. We derive a new stepsize rule
that optimizes the prediction error in order to improve the short-term
performance of an ADP algorithm. With only one, relatively insensitive tunable
parameter, the new rule adapts to the level of noise in the problem and
produces faster convergence in numerical experiments.Comment: Matlab files are included with the paper sourc
Análise da influência da taxa de aprendizado e do fator de desconto sobre o desempenho dos algoritmos Q-learning e SARSA: aplicação do aprendizado por reforço na navegação autônoma
Nos algoritmos de aprendizado por reforço, a taxa de aprendizado (alpha) e o fator de desconto (gamma) podem ser definidos entre qualquer valor no intervalo entre 0 e 1. Assim, adotando os conceitos de regressão logÃstica, é proposta uma metodologia estatÃstica para a análise da influência da variação de \alpha e \gamma nos algoritmos Q-learning e SARSA. Como estudo de caso, o aprendizado por reforço foi aplicado em experimentos de navegação autônoma. A análise de resultados mostrou que simples variações em \alpha e \gamma podem interferir diretamente no desempenho do aprendizado por reforço
Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning
It has been well demonstrated that inverse reinforcement learning (IRL) is an
effective technique for teaching machines to perform tasks at human skill
levels given human demonstrations (i.e., human to machine apprenticeship
learning). This paper seeks to show that a similar application can be
demonstrated with human learners. That is, given demonstrations from human
experts inverse reinforcement learning techniques can be used to teach other
humans to perform at higher skill levels (i.e., human to human apprenticeship
learning). To show this two experiments were conducted using a simple,
real-time web game where players were asked to touch targets in order to earn
as many points as possible. For the experiment player performance was defined
as the number of targets a player touched, irrespective of the points that a
player actually earned. This allowed for in-game points to be modified and the
effect of these alterations on performance measured. At no time were
participants told the true performance metric. To determine the point
modifications IRL was applied on demonstrations of human experts playing the
game. The results of the experiment show with significance that performance
improved over the control for select treatment groups. Finally, in addition to
the experiment, we also detail the algorithmic challenges we faced when
conducting the experiment and the techniques we used to overcome them.Comment: 31 pages, 23 figures, Submitted to Journal of Artificial Intelligence
Research, "for source code, see https://github.com/mrucker/kpirl-kla