4,902 research outputs found

    RE-MOVE: An Adaptive Policy Design Approach for Dynamic Environments via Language-Based Feedback

    Full text link
    Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (\textbf{RE}quest help and \textbf{MOVE} on), which uses language-based feedback to adjust trained policies to real-time changes in the environment. In this work, we enable the trained policy to decide \emph{when to ask for feedback} and \emph{how to incorporate feedback into trained policies}. RE-MOVE incorporates epistemic uncertainty to determine the optimal time to request feedback from humans and uses language-based feedback for real-time adaptation. We perform extensive synthetic and real-world evaluations to demonstrate the benefits of our proposed approach in several test-time dynamic navigation scenarios. Our approach enable robots to learn from human feedback and adapt to previously unseen adversarial situations

    Core Challenges in Embodied Vision-Language Planning

    Full text link
    Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment.Comment: 35 page

    Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

    Full text link
    Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural language-and-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments. We publicly release code and data at https://github.com/khanhptnk/hanna . A video demo is available at https://youtu.be/18P94aaaLKg .Comment: In EMNLP 201

    ENRICHING COMMUNICATION BETWEEN HUMANS AND AI AGENTS

    Get PDF
    Equipping AI agents with effective, human-compatible communication capabilities is pivotal to enabling them to effectively serve and aid humans. On one hand, agents should understand humans, being able to infer intentions and extract knowledge from language utterances. On the other hand, they should also help humans understand them, conveying (un)certainties and proactively consulting humans when facing difficult situations. This dissertation presents new training and evaluation frameworks that enrich communication between humans and AI agents. These frameworks improve two capabilities of an agent: (1) the ability to learn through natural communication with humans and (2) the ability to request and interpret information from humans during task execution. Regarding the first capability, I study the possibility and challenges of training agents with noisy human ratings. Providing humans with more expressive tools for teaching agents, I propose a framework that employs descriptive language as the teaching medium. On the second capability, I introduce new benchmarks that evaluate an agent’s ability to exchange information with humans to successfully perform indoor navigation tasks. On these benchmarks, I build agents that are capable of requesting rich, contextually useful information and show that they significantly outperform those without such capability. I conclude the dissertation with discussions on how to develop more sophisticated communication capabilities for agents
    • …
    corecore