Search CORE

700 research outputs found

Evolutionary Algorithms for Reinforcement Learning

Author: Grefenstette J. J.
Moriarty D. E.
Schultz A. C.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications

arXiv.org e-Print Archive

Crossref

RLHF and IIA: Perverse Incentives

Author: Dong Shi
Lam Grace
Lu Xiuyuan
Van Roy Benjamin
Wen Zheng
Xu Wanqiao
Publication venue
Publication date: 01/02/2024
Field of study

Existing algorithms for reinforcement learning from human feedback (RLHF) can incentivize responses at odds with preferences because they are based on models that assume independence of irrelevant alternatives (IIA). The perverse incentives induced by IIA hinder innovations on query formats and learning algorithms

arXiv.org e-Print Archive

Evolving FPS Game Players by Using Continuous EDA-RL

Author: Handa Hisashi
Tsubota Hajime
Publication venue: IEEE SMC Hiroshima Chapter
Publication date: 01/11/2009
Field of study

This paper extends EDA-RL, Estimation of Distribution Algorithms for Reinforcement Learning Problems, to continuous domain. The extended EDA-RL is used to constitiute FPS game players. In order to cope with continuous input-output relations, Gaussian Network is employed as in EBNA. Simulation results on Unreal Tournament 2004, one of major FPS games, confirm the effectiveness of the proposed method

Hiroshima University Institutional Repository

Okayama University Scientific Achievement Repository

Selector-Actor-Critic and Tuner-Actor-Critic Algorithms for Reinforcement Learning

Author: Kamal Ahmed
Masadeh Ala\u27eddin
Wang Zhengdao
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2019
Field of study

This work presents two reinforcement learning (RL) architectures, which mimic rational humans in the way of analyzing the available information and making decisions. The proposed algorithms are called selector-actor-critic (SAC) and tuner-actor-critic (TAC). They are obtained by modifying the well known actor-critic (AC) algorithm. SAC is equipped with an actor, a critic, and a selector. The role of the selector is to determine the most promising action at the current state based on the last estimate from the critic. TAC is model based, and consists of a tuner, a model-learner, an actor, and a critic. After receiving the approximated value of the current state-action pair from the critic and the learned model from the model-learner, the tuner uses the Bellman equation to tune the value of the current state-action pair. Then, this tuned value is used by the actor to optimize the policy. We investigate the performance of the proposed algorithms, and compare with AC algorithm to show the advantages of the proposed algorithms using numerical simulations

Digital Repository @ Iowa State University (ISU)

Crossref