3 research outputs found
Continual planning for cross-modal situated clarification in human-robot interaction
Abstract — Robots do not fully understand the world they are situated in. This includes what humans talk to them about. A fundamental problem is thus how a robot can clarify such a lack of understanding. This paper addresses the issue of how a robot can create a plan for resolving a need for clarification. It characterises situated clarification as an information need which may arise in any sensory-motoric modality required to interpret the situated context of the robot, or any deliberative modality referring to that context. It then focuses on how, once a clarification need has been identified, the robot can create a plan in which one or more modalities are used to resolve it. Modalities are involved on the basis of the types of information they can provide. These information types are identified in the ontologies the modalities use to interconnect their content with content of other modalities (via information fusion). We take a continual approach to planning and execution monitoring. This provides the ability to re-plan depending on modality availability and success in resolving (part of) a clarification need. We illustrate the implementation on several examples. I
MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS
This thesis presents a theoretical framework for the design of user-programmable
robots. The objective of the work is to investigate multi-modal unconstrained natural
instructions given to robots in order to design a learning robot. A corpus-centred
approach is used to design an agent that can reason, learn and interact with a human in a
natural unconstrained way. The corpus-centred design approach is formalised and
developed in detail. It requires the developer to record a human during interaction and
analyse the recordings to find instruction primitives. These are then implemented into a
robot. The focus of this work has been on how to combine speech and gesture using
rules extracted from the analysis of a corpus. A multi-modal integration algorithm is
presented, that can use timing and semantics to group, match and unify gesture and
language. The algorithm always achieves correct pairings on a corpus and initiates
questions to the user in ambiguous cases or missing information. The domain of card
games has been investigated, because of its variety of games which are rich in rules and
contain sequences. A further focus of the work is on the translation of rule-based
instructions. Most multi-modal interfaces to date have only considered sequential
instructions. The combination of frame-based reasoning, a knowledge base organised as
an ontology and a problem solver engine is used to store these rules. The understanding
of rule instructions, which contain conditional and imaginary situations require an agent
with complex reasoning capabilities. A test system of the agent implementation is also
described. Tests to confirm the implementation by playing back the corpus are
presented. Furthermore, deployment test results with the implemented agent and human
subjects are presented and discussed. The tests showed that the rate of errors that are
due to the sentences not being defined in the grammar does not decrease by an
acceptable rate when new grammar is introduced. This was particularly the case for
complex verbal rule instructions which have a large variety of being expressed