207 research outputs found
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
This paper presents the MAXQ approach to hierarchical reinforcement learning
based on decomposing the target Markov decision process (MDP) into a hierarchy
of smaller MDPs and decomposing the value function of the target MDP into an
additive combination of the value functions of the smaller MDPs. The paper
defines the MAXQ hierarchy, proves formal results on its representational
power, and establishes five conditions for the safe use of state abstractions.
The paper presents an online model-free learning algorithm, MAXQ-Q, and proves
that it converges wih probability 1 to a kind of locally-optimal policy known
as a recursively optimal policy, even in the presence of the five kinds of
state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q
through a series of experiments in three domains and shows experimentally that
MAXQ-Q (with state abstractions) converges to a recursively optimal policy much
faster than flat Q learning. The fact that MAXQ learns a representation of the
value function has an important benefit: it makes it possible to compute and
execute an improved, non-hierarchical policy via a procedure similar to the
policy improvement step of policy iteration. The paper demonstrates the
effectiveness of this non-hierarchical execution experimentally. Finally, the
paper concludes with a comparison to related work and a discussion of the
design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure
Multi-Agent Deep Reinforcement Learning with Human Strategies
Deep learning has enabled traditional reinforcement learning methods to deal
with high-dimensional problems. However, one of the disadvantages of deep
reinforcement learning methods is the limited exploration capacity of learning
agents. In this paper, we introduce an approach that integrates human
strategies to increase the exploration capacity of multiple deep reinforcement
learning agents. We also report the development of our own multi-agent
environment called Multiple Tank Defence to simulate the proposed approach. The
results show the significant performance improvement of multiple agents that
have learned cooperatively with human strategies. This implies that there is a
critical need for human intellect teamed with machines to solve complex
problems. In addition, the success of this simulation indicates that our
multi-agent environment can be used as a testbed platform to develop and
validate other multi-agent control algorithms.Comment: 2019 IEEE International Conference on Industrial Technology (ICIT),
Melbourne, Australi
Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
A mechanism called Eligibility Propagation is proposed to speed up the Time
Hopping technique used for faster Reinforcement Learning in simulations.
Eligibility Propagation provides for Time Hopping similar abilities to what
eligibility traces provide for conventional Reinforcement Learning. It
propagates values from one state to all of its temporal predecessors using a
state transitions graph. Experiments on a simulated biped crawling robot
confirm that Eligibility Propagation accelerates the learning process more than
3 times.Comment: 7 page
- …