2 research outputs found
Reachability and Differential based Heuristics for Solving Markov Decision Processes
The solution convergence of Markov Decision Processes (MDPs) can be
accelerated by prioritized sweeping of states ranked by their potential impacts
to other states. In this paper, we present new heuristics to speed up the
solution convergence of MDPs. First, we quantify the level of reachability of
every state using the Mean First Passage Time (MFPT) and show that such
reachability characterization very well assesses the importance of states which
is used for effective state prioritization. Then, we introduce the notion of
backup differentials as an extension to the prioritized sweeping mechanism, in
order to evaluate the impacts of states at an even finer scale. Finally, we
extend the state prioritization to the temporal process, where only partial
sweeping can be performed during certain intermediate value iteration stages.
To validate our design, we have performed numerical evaluations by comparing
the proposed new heuristics with corresponding classic baseline mechanisms. The
evaluation results showed that our reachability based framework and its
differential variants have outperformed the state-of-the-art solutions in terms
of both practical runtime and number of iterations.Comment: The paper was published in 2017 International Symposium on Robotics
Research (ISRR
Accelerating Goal-Directed Reinforcement Learning by Model Characterization
We propose a hybrid approach aimed at improving the sample efficiency in
goal-directed reinforcement learning. We do this via a two-step mechanism where
firstly, we approximate a model from Model-Free reinforcement learning. Then,
we leverage this approximate model along with a notion of reachability using
Mean First Passage Times to perform Model-Based reinforcement learning. Built
on such a novel observation, we design two new algorithms - Mean First Passage
Time based Q-Learning (MFPT-Q) and Mean First Passage Time based DYNA
(MFPT-DYNA), that have been fundamentally modified from the state-of-the-art
reinforcement learning techniques. Preliminary results have shown that our
hybrid approaches converge with much fewer iterations than their corresponding
state-of-the-art counterparts and therefore requiring much fewer samples and
much fewer training trials to converge.Comment: The paper was published in 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS