3 research outputs found

    Continual planning for cross-modal situated clarification in human-robot interaction

    No full text
    Abstract — Robots do not fully understand the world they are situated in. This includes what humans talk to them about. A fundamental problem is thus how a robot can clarify such a lack of understanding. This paper addresses the issue of how a robot can create a plan for resolving a need for clarification. It characterises situated clarification as an information need which may arise in any sensory-motoric modality required to interpret the situated context of the robot, or any deliberative modality referring to that context. It then focuses on how, once a clarification need has been identified, the robot can create a plan in which one or more modalities are used to resolve it. Modalities are involved on the basis of the types of information they can provide. These information types are identified in the ontologies the modalities use to interconnect their content with content of other modalities (via information fusion). We take a continual approach to planning and execution monitoring. This provides the ability to re-plan depending on modality availability and success in resolving (part of) a clarification need. We illustrate the implementation on several examples. I

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed
    corecore