Visually grounded human-robot interaction is recognized
to be an essential ingredient of socially intelligent robots, and the
integration of vision and language increasingly attracts attention of
researchers in diverse fields. However, most systems lack the capability
to adapt and expand themselves beyond the preprogrammed set
of communicative behaviors. Their linguistic capabilities are still far
from being satisfactory which make them unsuitable for real-world
applications. In this paper we will present a system in which a robotic
agent can learn a grounded language model by actively interacting
with a human user. The model is grounded in the sense that meaning
of the words is linked to a concrete sensorimotor experience of the
agent, and linguistic rules are automatically extracted from the interaction
data. The system has been tested on the NAO humanoid robot
and it has been used to understand and generate appropriate natural
language descriptions of real objects. The system is also capable of
conducting a verbal interaction with a human partner in potentially
ambiguous situations