Search CORE

4 research outputs found

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Author: QiMing Fu
Quan Liu
Shan Zhong
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with l2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency

Crossref

Directory of Open Access Journals

PubMed Central

Approximate Policy Iteration with Linear Action Models

Author: Szepesvari Csaba
Yao Hengshuai
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 20/09/2021
Field of study

In this paper we consider the problem of finding a good policy given some batch data.We propose a new approach, LAM-API, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy.A natural choice for the policy evaluation step in this algorithm is to use least-squares temporal difference (LSTD) learning algorithm.Empirical results on three benchmark problems show that this particular instance of LAM-API performs competitively as compared with LSPI, both from the point of view of data and computational efficiency

Association for the Advancement of Artificial Intelligence: AAAI Publications