10 research outputs found

    From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)

    Full text link
    Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods. Due to the innate complexity of this task, researchers have tried approaching the problem from a variety of different angles, the full scope of which has not yet been captured within an overarching report. This survey first summarizes the representative work of learning-based approaches for the VIN task and then identifies and discusses lingering issues impeding the VIN performance, as well as motivates future research in these key areas worth exploring for the community

    Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation

    Full text link
    We present Success weighted by Completion Time (SCT), a new metric for evaluating navigation performance for mobile robots. Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics. In contrast, SCT explicitly takes the agent's dynamics model into consideration, and aims to accurately capture how well the agent has approximated the fastest navigation behavior afforded by its dynamics. While several embodied navigation works use point-turn dynamics, we focus on unicycle-cart dynamics for our agent, which better exemplifies the dynamics model of popular mobile robotics platforms (e.g., LoCoBot, TurtleBot, Fetch, etc.). We also present RRT*-Unicycle, an algorithm for unicycle dynamics that estimates the fastest collision-free path and completion time from a starting pose to a goal location in an environment containing obstacles. We experiment with deep reinforcement learning and reward shaping to train and compare the navigation performance of agents with different dynamics models. In evaluating these agents, we show that in contrast to SPL, SCT is able to capture the advantages in navigation speed a unicycle model has over a simpler point-turn model of dynamics. Lastly, we show that we can successfully deploy our trained models and algorithms outside of simulation in the real world. We embody our agents in an real robot to navigate an apartment, and show that they can generalize in a zero-shot manner

    A Survey of Embodied AI: From Simulators to Research Tasks

    Full text link
    There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions with their environments from an egocentric perception similar to humans. Consequently, there has been substantial growth in the demand for embodied AI simulators to support various embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of Artificial General Intelligence (AGI), but there has not been a contemporary and comprehensive survey of this field. This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research. By evaluating nine current embodied AI simulators with our proposed seven features, this paper aims to understand the simulators in their provision for use in embodied AI research and their limitations. Lastly, this paper surveys the three main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation metrics and datasets. Finally, with the new insights revealed through surveying the field, the paper will provide suggestions for simulator-for-task selections and recommendations for the future directions of the field.Comment: Under Review for IEEE TETC

    Emergence of Intelligent Navigation Behavior in Embodied Agents from Massive-Scale Simulation

    Get PDF
    The goal of Artificial Intelligence is to build ‘thinking machines’ that ‘use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.’ In this dissertation, we will argue that the intelligence required for this goal emerges from massive-scale simulation. We will show a specific case: that intel- ligent navigation behavior emerges from massive-scale simulation and deep reinforcement learning. Towards this end, we introduce Decentralized Distributed PPO (DD-PPO), a method that scales reinforcement learning to multiple GPUs and machines. We use DD-PPO to train agents for PointGoal navigation (e.g. ‘Go 5 meters north and 10 meters east relative to start’) for the equivalent of 80 years of human experience. This massive-scale training results in near-perfect autonomous navigation in an unseen environment without access to a map. We then examine the inner workings of special case of PointGoalNav agents. We find that (1) their memory enables shortcuts, i.e. efficiently travel through previously unexplored parts of the environment; (2) there is emergence of maps in their memory, i.e. a detailed occupancy grid of the environment can be decoded from it. We then introduce Variable Experience Rollout (VER), a method that efficiently scales reinforcement learning on a single GPU or machine. We use VER to train chained skills for mobile manipulation. We find a surprising emergence of navigation in skills that do not ostensibly require any navigation. Specifically, the pick skill involves a robot picking an object from a table. During training, the robot was always spawned close to the table and never needs to navigate. However, we find that if navigation actions are part of the action space, the robot learns to navigate then pick an object in new environments with 50% success, demonstrating surprisingly high out-of-distribution generalization.Ph.D

    Bayesian State Tracking and Sim-to-Real Transfer for Vision-and-Language Navigation

    Get PDF
    A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) within the framework of Bayesian state tracking -- learning observation and motion models conditioned on these expectable events. Together with a mapper that constructs a semantic spatial map on-the-fly during navigation, we formulate an end-to-end differentiable Bayes filter and train it to identify the goal by predicting the most likely trajectory through the map according to the instructions. The resulting navigation policy constitutes a new approach to instruction following that explicitly models a probability distribution over states, encoding strong geometric and algorithmic priors while enabling greater explainability. Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map. On the full VLN task, i.e. navigating to the goal location, our approach achieves promising results with less reliance on navigation constraints. In the second half of the thesis, we study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions. Recent work on the task of VLN has achieved significant progress in simulation. To assess the implications of this work for robotics, we transfer a VLN agent trained in simulation to a physical robot. To bridge the gap between the high-level discrete action space learned by the VLN agent, and the robot's low-level continuous action space, we propose a subgoal model to identify nearby waypoints, and use domain randomization to mitigate visual domain differences. For accurate sim and real comparisons in parallel environments, we annotate a 325m2 office space with 1.3km of navigation instructions, and create a digitized replica in simulation. We find that sim-to-real transfer to an environment not seen in training is successful if an occupancy map and navigation graph can be collected and annotated in advance (success rate of 46.8% vs. 55.9% in sim), but much more challenging in the hardest setting with no prior mapping at all (success rate of 22.5%).M.S
    corecore