18 research outputs found
Visually-Grounded Language Model for Human-Robot Interaction
Visually grounded human-robot interaction is recognized
to be an essential ingredient of socially intelligent robots, and the
integration of vision and language increasingly attracts attention of
researchers in diverse fields. However, most systems lack the capability
to adapt and expand themselves beyond the preprogrammed set
of communicative behaviors. Their linguistic capabilities are still far
from being satisfactory which make them unsuitable for real-world
applications. In this paper we will present a system in which a robotic
agent can learn a grounded language model by actively interacting
with a human user. The model is grounded in the sense that meaning
of the words is linked to a concrete sensorimotor experience of the
agent, and linguistic rules are automatically extracted from the interaction
data. The system has been tested on the NAO humanoid robot
and it has been used to understand and generate appropriate natural
language descriptions of real objects. The system is also capable of
conducting a verbal interaction with a human partner in potentially
ambiguous situations
BAYESIAN APPROACHES TO HUMAN-ROBOT INTERACTION: FROM LANGUAGE GROUNDING TO ACTION LEARNING AND UNDERSTANDING
In human-robot interaction field, the robot is no longer considered as a tool but as a
partner, which supports the work of humans. Environments that feature the interaction
and collaboration of humans and robots present a number of challenges involving robot
learning and interactive capabilities. In order to operate in these environments, the robot
must not only be able to do, but also be able to interact and especially to \u201dunderstand\u201d.
This thesis proposes a unified probabilistic framework that allows a robot to develop
basic cognitive skills essential for collaboration. To this aim we embrace the idea of motor
simulation - well established in cognitive science and neuroscience - in which the robot
reenacts in simulation its own internal models used for physically performing action. This
particular view offers the possibility to unify apparently distinct cognitive phenomena such
as learning, interaction, understanding and dialogue, just to name a few. Ideas presented
here are corroborated by experimental results performed both in simulation and on a
humanoid robotic platform.
The first contribution in this direction is a robust Bayesian method to estimate (i.e.
learn) the parameters of internal models by observing other skilled actors performing
goal-directed actions. In addition to deriving a theoretically sound solution for the learning
problem, our approach establishes theoretical links between Bayesian inference and
gradient-based optimization methods. Using the expectation propagation (EP) algorithm,
a similar algorithm is derived for multiple internal models scenario.
Once learned, internal models are reused in simulation to \u201dunderstand\u201d actions performed
by other actors, which is a necessary precondition for successful interaction. We
have proposed that action understanding can be cast as an approximate Bayesian inference
in which the covert activity of internal models produces hypotheses that are tested
in parallel through a sequential Monte Carlo approach. Here, approximate Bayesian inference
is offered as a plausible mechanistic implementation of the idea of motor simulation
making it feasible in real-time and with limited resources.
Finally, we have investigated how the robot can learn a grounded language model
in order to be bootstrapped into communication. Features extracted from the learned
internal models, as well as descriptors of various perceptual categories, are fed into a novel
multi-instance semi-supervised learning algorithm able to perform semantic clustering and
associate words, either nouns or verbs, with their grounded meaning
Resolving ambiguities in a grounded human-robot interaction
In this paper we propose a trainable system that learns grounded language models from examples with a minimum of user intervention and without feedback. We have focused on the acquisition of grounded meanings of spatial and adjective/noun terms. The system has been used to understand and subsequently to generate appropriate natural language descriptions of real objects and to engage in verbal interactions with a human partner. We have also addressed the problem of resolving eventual ambiguities arising during verbal interaction through an information theoretic approach
A Probabilistic Approach to Learning a Visually Grounded Language Model through Human-Robot Interaction
A Language is among the most fascinating and complex cognitive activities that develops rapidly since the early months of infants' life. The aim of the present work is to provide a humanoid robot with cognitive, perceptual and motor skills fundamental for the acquisition of a rudimentary form of language. We present a novel probabilistic model, inspired by the findings in cognitive sciences, able to associate spoken words with their perceptually grounded meanings. The main focus is set on acquiring the meaning of various perceptual categories (e. g. red, blue, circle, above, etc.), rather than specific world entities (e. g. an apple, a toy, etc.). Our probabilistic model is based on a variant of multi-instance learning technique, and it enables a robotic platform to learn grounded meanings of adjective/noun terms. The systems could be used to understand and generate appropriate natural language descriptions of real objects in a scene, and it has been successfully tested on the NAO humanoid robotic platform
Per una storia dell'Universit\ue0 di Macerata
Sezione monografica del fascicolo n. 13 della rivista "Annali di storia delle universit\ue0 italiane" (CISUI)E 183532
Codice ISI: 1127-825