2,701 research outputs found

    DEUX: Active Exploration for Learning Unsupervised Depth Perception

    Full text link
    Depth perception models are typically trained on non-interactive datasets with predefined camera trajectories. However, this often introduces systematic biases into the learning process correlated to specific camera paths chosen during data acquisition. In this paper, we investigate the role of how data is collected for learning depth completion, from a robot navigation perspective, by leveraging 3D interactive environments. First, we evaluate four depth completion models trained on data collected using conventional navigation techniques. Our key insight is that existing exploration paradigms do not necessarily provide task-specific data points to achieve competent unsupervised depth completion learning. We then find that data collected with respect to photometric reconstruction has a direct positive influence on model performance. As a result, we develop an active, task-informed, depth uncertainty-based motion planning approach for learning depth completion, which we call DEpth Uncertainty-guided eXploration (DEUX). Training with data collected by our approach improves depth completion by an average greater than 18% across four depth completion models compared to existing exploration methods on the MP3D test set. We show that our approach further improves zero-shot generalization, while offering new insights into integrating robot learning-based depth estimation

    Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

    Full text link
    It is common to implicitly assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is itself a major challenge. We address the problem of learning to look around: if a visual agent has the ability to voluntarily acquire new views to observe its environment, how can it learn efficient exploratory behaviors to acquire informative observations? We propose a reinforcement learning solution, where the agent is rewarded for actions that reduce its uncertainty about the unobserved portions of its environment. Based on this principle, we develop a recurrent neural network-based approach to perform active completion of panoramic natural scenes and 3D object shapes. Crucially, the learned policies are not tied to any recognition task nor to the particular semantic content seen during training. As a result, 1) the learned "look around" behavior is relevant even for new tasks in unseen environments, and 2) training data acquisition involves no manual labeling. Through tests in diverse settings, we demonstrate that our approach learns useful generic policies that transfer to new unseen tasks and environments. Completion episodes are shown at https://goo.gl/BgWX3W

    Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

    Full text link
    A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio

    A Survey of Embodied AI: From Simulators to Research Tasks

    Full text link
    There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions with their environments from an egocentric perception similar to humans. Consequently, there has been substantial growth in the demand for embodied AI simulators to support various embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of Artificial General Intelligence (AGI), but there has not been a contemporary and comprehensive survey of this field. This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research. By evaluating nine current embodied AI simulators with our proposed seven features, this paper aims to understand the simulators in their provision for use in embodied AI research and their limitations. Lastly, this paper surveys the three main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation metrics and datasets. Finally, with the new insights revealed through surveying the field, the paper will provide suggestions for simulator-for-task selections and recommendations for the future directions of the field.Comment: Under Review for IEEE TETC
    • …
    corecore