40 research outputs found
A Tale of Two DRAGGNs: A Hybrid Approach for Interpreting Action-Oriented and Goal-Oriented Instructions
Robots operating alongside humans in diverse, stochastic environments must be
able to accurately interpret natural language commands. These instructions
often fall into one of two categories: those that specify a goal condition or
target state, and those that specify explicit actions, or how to perform a
given task. Recent approaches have used reward functions as a semantic
representation of goal-based commands, which allows for the use of a
state-of-the-art planner to find a policy for the given task. However, these
reward functions cannot be directly used to represent action-oriented commands.
We introduce a new hybrid approach, the Deep Recurrent Action-Goal Grounding
Network (DRAGGN), for task grounding and execution that handles natural
language from either category as input, and generalizes to unseen environments.
Our robot-simulation results demonstrate that a system successfully
interpreting both goal-oriented and action-oriented task specifications brings
us closer to robust natural language understanding for human-robot interaction.Comment: Accepted at the 1st Workshop on Language Grounding for Robotics at
ACL 201
A Tale of Two DRAGGNs: A Hybrid Approach for Interpreting Action-Oriented and Goal-Oriented Instructions
Robots operating alongside humans in diverse, stochastic environments must be
able to accurately interpret natural language commands. These instructions
often fall into one of two categories: those that specify a goal condition or
target state, and those that specify explicit actions, or how to perform a
given task. Recent approaches have used reward functions as a semantic
representation of goal-based commands, which allows for the use of a
state-of-the-art planner to find a policy for the given task. However, these
reward functions cannot be directly used to represent action-oriented commands.
We introduce a new hybrid approach, the Deep Recurrent Action-Goal Grounding
Network (DRAGGN), for task grounding and execution that handles natural
language from either category as input, and generalizes to unseen environments.
Our robot-simulation results demonstrate that a system successfully
interpreting both goal-oriented and action-oriented task specifications brings
us closer to robust natural language understanding for human-robot interaction.Comment: Accepted at the 1st Workshop on Language Grounding for Robotics at
ACL 201
Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents
Recent advancements in large language models (LLMs) have enabled a new
research domain, LLM agents, for solving robotics and planning tasks by
leveraging the world knowledge and general reasoning abilities of LLMs obtained
during pretraining. However, while considerable effort has been made to teach
the robot the "dos," the "don'ts" received relatively less attention. We argue
that, for any practical usage, it is as crucial to teach the robot the
"don'ts": conveying explicit instructions about prohibited actions, assessing
the robot's comprehension of these restrictions, and, most importantly,
ensuring compliance. Moreover, verifiable safe operation is essential for
deployments that satisfy worldwide standards such as ISO 61508, which defines
standards for safely deploying robots in industrial factory environments
worldwide. Aiming at deploying the LLM agents in a collaborative environment,
we propose a queryable safety constraint module based on linear temporal logic
(LTL) that simultaneously enables natural language (NL) to temporal constraints
encoding, safety violation reasoning and explaining, and unsafe action pruning.
To demonstrate the effectiveness of our system, we conducted experiments in
VirtualHome environment and on a real robot. The experimental results show that
our system strictly adheres to the safety constraints and scales well with
complex safety constraints, highlighting its potential for practical utility
Learning perceptually grounded word meanings from unaligned parallel data
In order for robots to effectively understand natural language commands, they must be able to acquire meaning representations that can be mapped to perceptual features in the external world. Previous approaches to learning these grounded meaning representations require detailed annotations at training time. In this paper, we present an approach to grounded language acquisition which is capable of jointly learning a policy for following natural language commands such as “Pick up the tire pallet,” as well as a mapping between specific phrases in the language and aspects of the external world; for example the mapping between the words “the tire pallet” and a specific object in the environment. Our approach assumes a parametric form for the policy that the robot uses to choose actions in response to a natural language command that factors based on the structure of the language. We use a gradient method to optimize model parameters. Our evaluation demonstrates the effectiveness of the model on a corpus of commands given to a robotic forklift by untrained users.U.S. Army Research Laboratory (Collaborative Technology Alliance Program, Cooperative Agreement W911NF-10-2-0016)United States. Office of Naval Research (MURIs N00014-07-1-0749)United States. Army Research Office (MURI N00014-11-1-0688)United States. Defense Advanced Research Projects Agency (DARPA BOLT program under contract HR0011-11-2-0008
Is searching full text more effective than searching abstracts?
<p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE<sup>® </sup>abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine.</p> <p>Results</p> <p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p> <p>Conclusion</p> <p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p
Spoken language interaction with robots: Recommendations for future research
With robotics rapidly advancing, more effective human–robot interaction is increasingly needed to realize the full potential of robots for society. While spoken language must be part of the solution, our ability to provide spoken language interaction capabilities is still very limited. In this article, based on the report of an interdisciplinary workshop convened by the National Science Foundation, we identify key scientific and engineering advances needed to enable effective spoken language interaction with robotics. We make 25 recommendations, involving eight general themes: putting human needs first, better modeling the social and interactive aspects of language, improving robustness, creating new methods for rapid adaptation, better integrating speech and language with other communication modalities, giving speech and language components access to rich representations of the robot’s current knowledge and state, making all components operate in real time, and improving research infrastructure and resources. Research and development that prioritizes these topics will, we believe, provide a solid foundation for the creation of speech-capable robots that are easy and effective for humans to work with
Role and task allocation framework for Multi-Robot Collaboration with latent knowledge estimation
In this work a novel framework for modeling role and task allocation in Cooperative Heterogeneous Multi-Robot Systems (CHMRSs) is presented. This framework encodes a CHMRS as a set of multidimensional relational structures (MDRSs). This set of structure defines collaborative tasks through both temporal and spatial relations between processes of heterogeneous robots. These relations are enriched with tensors which allow for geometrical reasoning about collaborative tasks. A learning schema is also proposed in order to derive the components of each MDRS. According to this schema, the components are learnt from data reporting the situated history of the processes executed by the team of robots. Data are organized as a multirobot collaboration treebank (MRCT) in order to support learning. Moreover, a generative approach, based on a probabilistic model, is combined together with nonnegative tensor decomposition (NTD) for both building the tensors and estimating latent knowledge. Preliminary evaluation of the performance of this framework is performed in simulation with three heterogeneous robots, namely, two Unmanned Ground Vehicles (UGVs) and one Unmanned Aerial Vehicle (UAV)