325 research outputs found
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Comprehension of spoken natural language is an essential component for robots
to communicate with human effectively. However, handling unconstrained spoken
instructions is challenging due to (1) complex structures including a wide
variety of expressions used in spoken language and (2) inherent ambiguity in
interpretation of human instructions. In this paper, we propose the first
comprehensive system that can handle unconstrained spoken language and is able
to effectively resolve ambiguity in spoken instructions. Specifically, we
integrate deep-learning-based object detection together with natural language
processing technologies to handle unconstrained spoken instructions, and
propose a method for robots to resolve instruction ambiguity through dialogue.
Through our experiments on both a simulated environment as well as a physical
industrial robot arm, we demonstrate the ability of our system to understand
natural instructions from human operators effectively, and how higher success
rates of the object picking task can be achieved through an interactive
clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA)
2018. Accompanying videos are available at the following links:
https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and
http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission
Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments
Referring to objects in a natural and unambiguous manner is crucial for
effective human-robot interaction. Previous research on learning-based
referring expressions has focused primarily on comprehension tasks, while
generating referring expressions is still mostly limited to rule-based methods.
In this work, we propose a two-stage approach that relies on deep learning for
estimating spatial relations to describe an object naturally and unambiguously
with a referring expression. We compare our method to the state of the art
algorithm in ambiguous environments (e.g., environments that include very
similar objects with similar relationships). We show that our method generates
referring expressions that people find to be more accurate (30% better)
and would prefer to use (32% more often).Comment: International Conference on Intelligent Robots and Systems (IROS
2019), Demo 1: Finding the described object (https://youtu.be/BE6-F6chW0w),
Demo 2: Referring to the pointed object (https://youtu.be/nmmv6JUpy8M),
Supplementary Video (https://youtu.be/sFjBa_MHS98
Learning Object Placements For Relational Instructions by Hallucinating Scene Representations
Robots coexisting with humans in their environment and performing services
for them need the ability to interact with them. One particular requirement for
such robots is that they are able to understand spatial relations and can place
objects in accordance with the spatial relations expressed by their user. In
this work, we present a convolutional neural network for estimating pixelwise
object placement probabilities for a set of spatial relations from a single
input image. During training, our network receives the learning signal by
classifying hallucinated high-level scene representations as an auxiliary task.
Unlike previous approaches, our method does not require ground truth data for
the pixelwise relational probabilities or 3D models of the objects, which
significantly expands the applicability in practical applications. Our results
obtained using real-world data and human-robot experiments demonstrate the
effectiveness of our method in reasoning about the best way to place objects to
reproduce a spatial relation. Videos of our experiments can be found at
https://youtu.be/zaZkHTWFMKMComment: Accepted at the 2020 IEEE International Conference on Robotics and
Automation (ICRA). Video at https://www.youtube.com/watch?v=zaZkHTWFMK
Bayesian Inference of Self-intention Attributed by Observer
Most of agents that learn policy for tasks with reinforcement learning (RL)
lack the ability to communicate with people, which makes human-agent
collaboration challenging. We believe that, in order for RL agents to
comprehend utterances from human colleagues, RL agents must infer the mental
states that people attribute to them because people sometimes infer an
interlocutor's mental states and communicate on the basis of this mental
inference. This paper proposes PublicSelf model, which is a model of a person
who infers how the person's own behavior appears to their colleagues. We
implemented the PublicSelf model for an RL agent in a simulated environment and
examined the inference of the model by comparing it with people's judgment. The
results showed that the agent's intention that people attributed to the agent's
movement was correctly inferred by the model in scenes where people could find
certain intentionality from the agent's behavior
- …