18 research outputs found

    Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search

    Full text link
    Today's automated vehicles lack the ability to cooperate implicitly with others. This work presents a Monte Carlo Tree Search (MCTS) based approach for decentralized cooperative planning using macro-actions for automated vehicles in heterogeneous environments. Based on cooperative modeling of other agents and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the state-action-values of each agent in a cooperative and decentralized manner, explicitly modeling the interdependence of actions between traffic participants. Macro-actions allow for temporal extension over multiple time steps and increase the effective search depth requiring fewer iterations to plan over longer horizons. Without predefined policies for macro-actions, the algorithm simultaneously learns policies over and within macro-actions. The proposed method is evaluated under several conflict scenarios, showing that the algorithm can achieve effective cooperative planning with learned macro-actions in heterogeneous environments

    Model Learning for Look-ahead Exploration in Continuous Control

    Full text link
    We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies . Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201

    Monte-Carlo Tree Search with Prioritized Node Expansion for Multi-Goal Task Planning

    Full text link
    Symbolic task planning for robots is computationally challenging due to the combinatorial complexity of the possible action space. This fact is amplified if there are several sub-goals to be achieved due to the increased length of the action sequences. In this work, we propose a multi-goal symbolic task planner for deterministic decision processes based on Monte Carlo Tree Search. We augment the algorithm by prioritized node expansion which prioritizes nodes that already have fulfilled some sub-goals. Due to its linear complexity in the number of sub-goals, our algorithm is able to identify symbolic action sequences of 145 elements to reach the desired goal state with up to 48 sub-goals while the search tree is limited to under 6500 nodes. We use action reduction based on a kinematic reachability criterion to further ease computational complexity. We combine our algorithm with object localization and motion planning and apply it to a real-robot demonstration with two manipulators in an industrial bearing inspection setting

    Hierarchical Policy Learning for Mechanical Search

    Get PDF
    Retrieving objects from clutters is a complex task, which requires multiple interactions with the environment until the target object can be extracted. These interactions involve executing action primitives like grasping or pushing as well as setting priorities for the objects to manipulate and the actions to execute. Mechanical Search (MS) is a framework for object retrieval, which uses a heuristic algorithm for pushing and rule-based algorithms for high-level planning. While rule-based policies profit from human intuition in how they work, they usually perform sub-optimally in many cases. Deep reinforcement learning (RL) has shown great performance in complex tasks such as taking decisions through evaluating pixels, which makes it suitable for training policies in the context of object-retrieval. In this work, we first formulate the MS problem in a principled formulation as a hierarchical POMDP. Based on this formulation, we propose a hierarchical policy learning approach for the MS problem. For demonstration, we present two main parameterized sub-policies: a push policy and an action selection policy. When integrated into the hierarchical POMDP's policy, our proposed sub-policies increase the success rate of retrieving the target object from less than 32% to nearly 80%, while reducing the computation time for push actions from multiple seconds to less than 10 milliseconds.Comment: ICRA 202

    Exploiting Action Categories in Learning Complex Games

    Get PDF

    An efficient approach to model-based hierarchical reinforcement learning

    Get PDF
    National Research Foundation (NRF) Singapore under SMART and Future Mobility; Ministry of Education, Singapore under its Academic Research Funding Tier

    O-MuZero : abstract planning models Induced by Options on the MuZero Algorithm

    Get PDF
    Training Reinforcement Learning agents that learn both the value function and the envi ronment model can be a very time consuming method, one of the main reasons for that is that these agents learn by actions one step at the time (primitive actions), while humans learn in a more abstract way. In this work we introduce O-MuZero: a method for guiding a Monte-Carlo Tree Search through the use of options (temporally-extended actions). Most related work use options to guide the planning but only acts with primitive actions. Our method, on the other hand, proposes to plan and act with the options used for planning. In order to achieve such result, we modify the Monte-Carlo Tree Search structure, where each node of the tree still represents a state but each edge is an option transition. We ex pect that our method allows the agent to see further into the state space and therefore, have a better quality planning. We show that our method can be combined with state-of-the-art on-line planning algorithms that uses a learned model. We evaluate different variations of our technique on previously established grid-world benchmarks and compare to the MuZero algorithm baseline, which is an algorithm that plans under a learned model and traditionally does not use options. Our method not only helps the agent to learn faster but also yields better results during on-line execution with limited time budgets. We empiri cally show that our method also improves model robustness, which means the ability of the model to play on environments slightly different from the one it trained.Agentes de aprendizado por reforço que aprendem tanto a função de valor quanto o mo delo do ambiente são métodos que podem consumir muito tempo, uma das principais razões para isso é que esses agentes aprendem através de ações com passo de cada vez (ações primitivas), enquanto os humanos aprendem de uma forma mais abstrata. Neste trabalho introduzimos O-MuZero: um método para guiar a busca de árvore Monte-Carlo através do uso de options. A maioria dos trabalhos relacionados utiliza options para guiar o planejamento, mas só joga com ações primitivas, nosso método, por outro lado, se propõe a planejar e jogar com as options usadas no planejamento. Para alcançar esse re sultado, modificamos a estrutura da Árvore de Busca de Monte-Carlo para que cada nodo da árvore ainda represente um estado, mas cada aresta é uma transação de uma option. Esperamos que nosso método permita que o agente veja mais além no espaço do estado e, portanto, faça um planejamento de melhor qualidade. Mostramos que nosso método pode ser combinado com algoritmos de planejamento on-line que jogam com um modelo aprendido. Avaliamos diferentes variações de nossa técnica em benchmarks previamente estabelecidos do ambiente e comparamos com a técnica de base. Nosso método não só ajuda o agente a aprender mais rapidamente, mas também produz melhores resultados durante o jogo. Empiricamente mostramos que o uso de nosso método também melhora a resiliência do modelo, o que significa a capacidade do modelo de jogar em ambientes ligeiramente diferentes daquele em que foi treinado
    corecore