12 research outputs found
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains
In this work, we present an approach to identify sub-tasks within a
demonstrated robot trajectory using language instructions. We identify these
sub-tasks using language provided during demonstrations as guidance to identify
sub-segments of a longer robot trajectory. Given a sequence of natural language
instructions and a long trajectory consisting of image frames and discrete
actions, we want to map an instruction to a smaller fragment of the trajectory.
Unlike previous instruction following works which directly learn the mapping
from language to a policy, we propose a language-conditioned change-point
detection method to identify sub-tasks in a problem. Our approach learns the
relationship between constituent segments of a long language command and
corresponding constituent segments of a trajectory. These constituent
trajectory segments can be used to learn subtasks or sub-goals for planning or
options as demonstrated by previous related work. Our insight in this work is
that the language-conditioned robot change-point detection problem is similar
to the existing video moment retrieval works used to identify sub-segments
within online videos. Through extensive experimentation, we demonstrate a
improvement over a baseline approach in accurately
identifying sub-tasks within a trajectory using our proposed method. Moreover,
we present a comprehensive study investigating sample complexity requirements
on learning this mapping, between language and trajectory sub-segments, to
understand if the video retrieval-based methods are realistic in real robot
scenarios.Comment: 9 Pages, 13 figures, Accepted paper at the RSS 2023 Workshop on
Articulate Robots: Utilizing Language for Robot Learnin
Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning
Human-AI collaborative policy synthesis is a procedure in which (1) a human
initializes an autonomous agent's behavior, (2) Reinforcement Learning improves
the human specified behavior, and (3) the agent can explain the final optimized
policy to the user. This paradigm leverages human expertise and facilitates a
greater insight into the learned behaviors of an agent. Existing approaches to
enabling collaborative policy specification involve black box methods which are
unintelligible and are not catered towards non-expert end-users. In this paper,
we develop a novel collaborative framework to enable humans to initialize and
interpret an autonomous agent's behavior, rooted in principles of
human-centered design. Through our framework, we enable humans to specify an
initial behavior model in the form of unstructured, natural language, which we
then convert to lexical decision trees. Next, we are able to leverage these
human-specified policies, to warm-start reinforcement learning and further
allow the agent to optimize the policies through reinforcement learning.
Finally, to close the loop on human-specification, we produce explanations of
the final learned policy, in multiple modalities, to provide the user a final
depiction about the learned policy of the agent. We validate our approach by
showing that our model can produce >80% accuracy, and that human-initialized
policies are able to successfully warm-start RL. We then conduct a novel
human-subjects study quantifying the relative subjective and objective benefits
of varying XAI modalities(e.g., Tree, Language, and Program) for explaining
learned policies to end-users, in terms of usability and interpretability and
identify the circumstances that influence these measures. Our findings
emphasize the need for personalized explainable systems that can facilitate
user-centric policy explanations for a variety of end-users
A Tale of Two DRAGGNs: A Hybrid Approach for Interpreting Action-Oriented and Goal-Oriented Instructions
Robots operating alongside humans in diverse, stochastic environments must be
able to accurately interpret natural language commands. These instructions
often fall into one of two categories: those that specify a goal condition or
target state, and those that specify explicit actions, or how to perform a
given task. Recent approaches have used reward functions as a semantic
representation of goal-based commands, which allows for the use of a
state-of-the-art planner to find a policy for the given task. However, these
reward functions cannot be directly used to represent action-oriented commands.
We introduce a new hybrid approach, the Deep Recurrent Action-Goal Grounding
Network (DRAGGN), for task grounding and execution that handles natural
language from either category as input, and generalizes to unseen environments.
Our robot-simulation results demonstrate that a system successfully
interpreting both goal-oriented and action-oriented task specifications brings
us closer to robust natural language understanding for human-robot interaction.Comment: Accepted at the 1st Workshop on Language Grounding for Robotics at
ACL 201
A Tale of Two DRAGGNs: A Hybrid Approach for Interpreting Action-Oriented and Goal-Oriented Instructions
Robots operating alongside humans in diverse, stochastic environments must be
able to accurately interpret natural language commands. These instructions
often fall into one of two categories: those that specify a goal condition or
target state, and those that specify explicit actions, or how to perform a
given task. Recent approaches have used reward functions as a semantic
representation of goal-based commands, which allows for the use of a
state-of-the-art planner to find a policy for the given task. However, these
reward functions cannot be directly used to represent action-oriented commands.
We introduce a new hybrid approach, the Deep Recurrent Action-Goal Grounding
Network (DRAGGN), for task grounding and execution that handles natural
language from either category as input, and generalizes to unseen environments.
Our robot-simulation results demonstrate that a system successfully
interpreting both goal-oriented and action-oriented task specifications brings
us closer to robust natural language understanding for human-robot interaction.Comment: Accepted at the 1st Workshop on Language Grounding for Robotics at
ACL 201
Planning with Abstract Markov Decision Processes
Robots acting in human-scale environments must plan under uncertainty in large state–action spaces and face constantly changing reward functions as requirements and goals change. Planning under uncertainty in large state–action spaces requires hierarchical abstraction for efficient computation. We introduce a new hierarchical planning framework called Abstract Markov Decision Processes (AMDPs) that can plan in a fraction of the time needed for complex decision making in ordinary MDPs. AMDPs provide abstract states, actions, and transition dynamics in multiple layers above a base-level “flat” MDP. AMDPs decompose problems into a series of subtasks with both local reward and local transition functions used to create policies for subtasks. The resulting hierarchical planning method is independently optimal at each level of abstraction, and is recursively optimal when the local reward and transition functions are correct. We present empirical results showing significantly improved planning speed, while maintaining solution quality, in the Taxi domain and in a mobile-manipulation robotics problem. Furthermore, our approach allows specification of a decision-making model for a mobile-manipulation problem on a Turtlebot, spanning from low-level control actions operating on continuous variables all the way up through high-level object manipulation tasks