18 research outputs found
Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Today's automated vehicles lack the ability to cooperate implicitly with
others. This work presents a Monte Carlo Tree Search (MCTS) based approach for
decentralized cooperative planning using macro-actions for automated vehicles
in heterogeneous environments. Based on cooperative modeling of other agents
and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the
state-action-values of each agent in a cooperative and decentralized manner,
explicitly modeling the interdependence of actions between traffic
participants. Macro-actions allow for temporal extension over multiple time
steps and increase the effective search depth requiring fewer iterations to
plan over longer horizons. Without predefined policies for macro-actions, the
algorithm simultaneously learns policies over and within macro-actions. The
proposed method is evaluated under several conflict scenarios, showing that the
algorithm can achieve effective cooperative planning with learned macro-actions
in heterogeneous environments
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Monte-Carlo Tree Search with Prioritized Node Expansion for Multi-Goal Task Planning
Symbolic task planning for robots is computationally challenging due to the
combinatorial complexity of the possible action space. This fact is amplified
if there are several sub-goals to be achieved due to the increased length of
the action sequences. In this work, we propose a multi-goal symbolic task
planner for deterministic decision processes based on Monte Carlo Tree Search.
We augment the algorithm by prioritized node expansion which prioritizes nodes
that already have fulfilled some sub-goals. Due to its linear complexity in the
number of sub-goals, our algorithm is able to identify symbolic action
sequences of 145 elements to reach the desired goal state with up to 48
sub-goals while the search tree is limited to under 6500 nodes. We use action
reduction based on a kinematic reachability criterion to further ease
computational complexity. We combine our algorithm with object localization and
motion planning and apply it to a real-robot demonstration with two
manipulators in an industrial bearing inspection setting
Hierarchical Policy Learning for Mechanical Search
Retrieving objects from clutters is a complex task, which requires multiple
interactions with the environment until the target object can be extracted.
These interactions involve executing action primitives like grasping or pushing
as well as setting priorities for the objects to manipulate and the actions to
execute. Mechanical Search (MS) is a framework for object retrieval, which uses
a heuristic algorithm for pushing and rule-based algorithms for high-level
planning. While rule-based policies profit from human intuition in how they
work, they usually perform sub-optimally in many cases. Deep reinforcement
learning (RL) has shown great performance in complex tasks such as taking
decisions through evaluating pixels, which makes it suitable for training
policies in the context of object-retrieval. In this work, we first formulate
the MS problem in a principled formulation as a hierarchical POMDP. Based on
this formulation, we propose a hierarchical policy learning approach for the MS
problem. For demonstration, we present two main parameterized sub-policies: a
push policy and an action selection policy. When integrated into the
hierarchical POMDP's policy, our proposed sub-policies increase the success
rate of retrieving the target object from less than 32% to nearly 80%, while
reducing the computation time for push actions from multiple seconds to less
than 10 milliseconds.Comment: ICRA 202
An efficient approach to model-based hierarchical reinforcement learning
National Research Foundation (NRF) Singapore under SMART and Future Mobility; Ministry of Education, Singapore under its Academic Research Funding Tier
O-MuZero : abstract planning models Induced by Options on the MuZero Algorithm
Training Reinforcement Learning agents that learn both the value function and the envi ronment model can be a very time consuming method, one of the main reasons for that is that these agents learn by actions one step at the time (primitive actions), while humans learn in a more abstract way. In this work we introduce O-MuZero: a method for guiding a Monte-Carlo Tree Search through the use of options (temporally-extended actions). Most related work use options to guide the planning but only acts with primitive actions. Our method, on the other hand, proposes to plan and act with the options used for planning. In order to achieve such result, we modify the Monte-Carlo Tree Search structure, where each node of the tree still represents a state but each edge is an option transition. We ex pect that our method allows the agent to see further into the state space and therefore, have a better quality planning. We show that our method can be combined with state-of-the-art on-line planning algorithms that uses a learned model. We evaluate different variations of our technique on previously established grid-world benchmarks and compare to the MuZero algorithm baseline, which is an algorithm that plans under a learned model and traditionally does not use options. Our method not only helps the agent to learn faster but also yields better results during on-line execution with limited time budgets. We empiri cally show that our method also improves model robustness, which means the ability of the model to play on environments slightly different from the one it trained.Agentes de aprendizado por reforço que aprendem tanto a função de valor quanto o mo delo do ambiente são métodos que podem consumir muito tempo, uma das principais razões para isso é que esses agentes aprendem através de ações com passo de cada vez (ações primitivas), enquanto os humanos aprendem de uma forma mais abstrata. Neste trabalho introduzimos O-MuZero: um método para guiar a busca de árvore Monte-Carlo através do uso de options. A maioria dos trabalhos relacionados utiliza options para guiar o planejamento, mas só joga com ações primitivas, nosso método, por outro lado, se propõe a planejar e jogar com as options usadas no planejamento. Para alcançar esse re sultado, modificamos a estrutura da Árvore de Busca de Monte-Carlo para que cada nodo da árvore ainda represente um estado, mas cada aresta é uma transação de uma option. Esperamos que nosso método permita que o agente veja mais além no espaço do estado e, portanto, faça um planejamento de melhor qualidade. Mostramos que nosso método pode ser combinado com algoritmos de planejamento on-line que jogam com um modelo aprendido. Avaliamos diferentes variações de nossa técnica em benchmarks previamente estabelecidos do ambiente e comparamos com a técnica de base. Nosso método não só ajuda o agente a aprender mais rapidamente, mas também produz melhores resultados durante o jogo. Empiricamente mostramos que o uso de nosso método também melhora a resiliência do modelo, o que significa a capacidade do modelo de jogar em ambientes ligeiramente diferentes daquele em que foi treinado