13 research outputs found

    Value Propagation Networks

    Full text link
    We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.Comment: Updated to match ICLR 2019 OpenReview's versio

    Cooperative Carrying Control for Mobile Robots in Indoor Scenario

    Get PDF
    openIn recent years, there has been a growing interest in designing multi-robot systems to provide cost-effective, fault-tolerant and reliable solutions to a variety of automated applications. In particular, from an industrial perspective, cooperative carrying techniques based on Reinforcement Learning (RL) gained a strong interest. Compared to a single robot system, this approach improves the system’s robustness and manipulation dexterity in the transportation of large objects. However, in the current state of the art, the environments’ dynamism and re-training procedure represent a considerable limitation for most of the existing cooperative carrying RL-based solutions. In this thesis, we employ the Value Propagation Networks (VPN) algorithm for cooperative multi-robot transport scenarios. We extend and test the Delta-Q cooperation metric to V-value-based agents, and we investigate path generation algorithms and trajectory tracking controllers for differential drive robots. Moreover, we explore localization algorithms in order to take advantage of range sensors and mitigate the drift errors of wheel odometry, and we conduct experiments to derive key performance indicators of range sensors' precision. Lastly, we perform realistic industrial indoor simulations using Robot Operating System (ROS) and Gazebo 3D visualization tool, including physical objects and 6G communication constraints. Our results showed that the proposed VPN-based algorithm outperforms the current state-of-the-art since the trajectory planning and dynamic obstacle avoidance are performed in real-time, without re-training the model, and under constant 6G network coverage.In recent years, there has been a growing interest in designing multi-robot systems to provide cost-effective, fault-tolerant and reliable solutions to a variety of automated applications. In particular, from an industrial perspective, cooperative carrying techniques based on Reinforcement Learning (RL) gained a strong interest. Compared to a single robot system, this approach improves the system’s robustness and manipulation dexterity in the transportation of large objects. However, in the current state of the art, the environments’ dynamism and re-training procedure represent a considerable limitation for most of the existing cooperative carrying RL-based solutions. In this thesis, we employ the Value Propagation Networks (VPN) algorithm for cooperative multi-robot transport scenarios. We extend and test the Delta-Q cooperation metric to V-value-based agents, and we investigate path generation algorithms and trajectory tracking controllers for differential drive robots. Moreover, we explore localization algorithms in order to take advantage of range sensors and mitigate the drift errors of wheel odometry, and we conduct experiments to derive key performance indicators of range sensors' precision. Lastly, we perform realistic industrial indoor simulations using Robot Operating System (ROS) and Gazebo 3D visualization tool, including physical objects and 6G communication constraints. Our results showed that the proposed VPN-based algorithm outperforms the current state-of-the-art since the trajectory planning and dynamic obstacle avoidance are performed in real-time, without re-training the model, and under constant 6G network coverage

    Hierarchies of Planning and Reinforcement Learning for Robot Navigation

    Get PDF
    Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goals derived from the plan to guide the RL policy in the source task. However, these approaches usually neglect the complex dynamics and sub-optimal sub-goal-reaching capabilities of the robot during planning. This work overcomes these limitations by proposing a novel hierarchical framework that utilizes a trainable planning policy for the HL representation. Thereby robot capabilities and environment conditions can be learned utilizing collected rollout data. We specifically introduce a planning policy based on value iteration with a learned transition model (VI-RL). In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.Comment: 7 pages, 5 figures, 2021 IEEE International Conference on Robotics and Automation (ICRA), v2: DOI number adde

    The StarCraft Multi-Agent Challenge

    Full text link
    In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0

    The NetHack learning environment

    Get PDF
    Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source and available at https://github.com/facebookresearch/nle

    Reinforcement learning and planning for autonomous agent navigation:With a focus on sparse reward settings

    Get PDF
    Being able to navigate our surroundings enables us humans to freely interact with our environment and is therefore an important skill for truly autonomous technical systems as well. The machine learning paradigm of reinforcement learning (RL) enables learning (neural network) policies for decision making through continuous interaction with the environment. However, if the rewards that are received as feedback are sparse, improving the policy gets difficult and inefficient. Therefore, this thesis focusses on improving policy learning under sparse rewards for autonomous agents tasked to reach dedicated goal locations.First, we present a novel spatial gradient (SG) strategy to select starting states at the boundary of the agents’ capabilities, which results in a curriculum that improves learning progress.Afterwards, we combine planning over abstract sub-goals with reinforcement learning to obtain policies to reach these sub-goals. The resulting sub-tasks make policy learning easier.We first present our hierarchical VI-RL policy architecture that utilizes a learned transition model for planning, which captures agent capabilities and enables generalization.Subsequently, we improve efficiency and performance of the sub-goal planning by learning to locally refine simple shortest path plans based on detailed local state information. Our proposed RL-trained Value Refinement Network (VRN) architecture additionally enables navigating dynamic environments without repeated global re-planning.Finally, we address the practically relevant setting where continuous environment interaction is not possible. Our HORIBLe-VRN algorithm allows to learn our hierarchical planning-based policies from pre-collected data, incorporating latent sub-goal inference as well as offline RL to improve over sub-optimal demonstrations
    corecore