3,518 research outputs found

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Crossmodal content binding in information-processing architectures

    Get PDF
    Operating in a physical context, an intelligent robot faces two fundamental problems. First, it needs to combine information from its different sensors to form a representation of the environment that is more complete than any of its sensors on its own could provide. Second, it needs to combine high-level representations (such as those for planning and dialogue) with its sensory information, to ensure that the interpretations of these symbolic representations are grounded in the situated context. Previous approaches to this problem have used techniques such as (low-level) information fusion, ontological reasoning, and (high-level) concept learning. This paper presents a framework in which these, and other approaches, can be combined to form a shared representation of the current state of the robot in relation to its environment and other agents. Preliminary results from an implemented system are presented to illustrate how the framework supports behaviours commonly required of an intelligent robot

    Merging multi-modal information and cross-modal learning in artificial cognitive systems

    Get PDF
    Čezmodalno povezovanje je združevanje dveh ali več modalnih predstavitev lastnosti neke entitete v skupno predstavitev. Gre za eno temeljnih lastnosti spoznavnih sistemov, ki delujejo v kompleksnem okolju. Da bi se spoznavni sistemi uspešno prilagajali spremembam v dinamičnem okolju, je potrebno mehanizem čezmodalnega povezovanja nadgraditi s čezmodalnim učenjem. Morebiti še najtežja naloga pa je integracija obeh mehanizmov v spoznavni sistem. Njuna vloga v takem sistemu je dvojna: premoščanje semantičnih vrzeli med modalnostmi ter mediacija med nižjenivojskimi mehanizmi za obelavo senzorskih podatkov in višjenivojskimi spoznavnimi procesi, kot sta npr. motivacija in načrtovanje. V magistrski nalogi predstavljamo pristop k modeliranju verjetnostnega večmodalnega združevanja informacij v spoznavnih sistemih. S pomočjo mar-kov-skih logičnih omrežij formuliramo model čezmodalnega povezovanja in učenja ter opišemo načela njegovega vključevanja v spoznavne arhitekture. Prototip modela smo ovrednotili samostojno, z eksperimenti, ki simulirajo trimodalno spoznavno arhitekturo. Na podlagi našega pristopa oblikujemo, implementiramo in integriramo tudi podsistem prepričanj, ki premošča semantični prepad v prototipu spoznavnega sistema George. George je inteligenten robot, ki je sposoben zaznavanja in prepoznavanja predmetov iz okolice ter učenja njihovih lastnosti s pomočjo pogovora s človekom. Njegov poglavitni namen je preizkus različnih paradigem o interaktivnemu učenju konceptov. V ta namen smo izdelali in izvedli interaktivne eksperimente za vrednotenje Georgevih vedenjskih mehanizmov. S temi eksperimenti smo naš pristop k večmodalnemu združevanju informacij preizkusili in ovrednotili tudi kot del delujočega spoznavnega sistema.Cross-modal binding is the ability to merge two or more modal representations of the same entity into a single shared representation. This ability is one of the fundamental properties of any cognitive system operating in a complex environment. In order to adapt successfully to changes in a dynamic environment the binding mechanism has to be supplemented with cross-modal learning. But perhaps the most difficult task is the integration of both mechanisms into a cognitive system. Their role in such a system is two-fold: to bridge the semantic gap between modalities, and to mediate between the lower-level mechanisms for processing the sensory data, and the higher-level cognitive processes, such as motivation and planning. In this master thesis, we present an approach to probabilistic merging of multi-modal information in cognitive systems. By this approach, we formulate a model of binding and cross-modal learning in Markov logic networks, and describe the principles of its integration into a cognitive architecture. We implement a prototype of the model and evaluate it with off-line experiments that simulate a cognitive architecture with three modalities. Based on our approach, we design, implement and integrate the belief layer -- a subsystem that bridges the semantic gap in a prototype cognitive system named George. George is an intelligent robot that is able to detect and recognise objects in its surroundings, and learn about their properties in a situated dialogue with a human tutor. Its main purpose is to validate various paradigms of interactive learning. To this end, we have developed and performed on-line experiments that evaluate the mechanisms of robot\u27s behaviour. With these experiments, we were also able to test and evaluate our approach to merging multi-modal information as part of a functional cognitive system

    Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches

    Full text link
    People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, user-facing artificial intelligence systems will require similar skills in pragmatics: relying on various types of context -- from shared linguistic goals and conventions, to the visual and embodied world -- to use language effectively. We survey existing grounded settings and pragmatic modeling approaches and analyze how the task goals, environmental contexts, and communicative affordances in each work enrich linguistic meaning. We present recommendations for future grounded task design to naturally elicit pragmatic phenomena, and suggest directions that focus on a broader range of communicative contexts and affordances.Comment: Findings of EMNLP 202

    Domain independent goal recognition

    Get PDF
    Goal recognition is generally considered to follow plan recognition. The plan recognition problem is typically defined to be that of identifying which plan in a given library of plans is being executed, given a sequence of observed actions. Once a plan has been identified, the goal of the plan can be assumed to follow. In this work, we address the problem of goal recognition directly, without assuming a plan library. Instead, we start with a domain description, just as is used for plan construction, and a sequence of action observations. The task, then, is to identify which possible goal state is the ultimate destination of the trajectory being observed. We present a formalisation of the problem and motivate its interest, before describing some simplifying assumptions we have made to arrive at a first implementation of a goal recognition system, AUTOGRAPH. We discuss the techniques employed in AUTOGRAPH to arrive at a tractable approximation of the goal recognition problem and show results for the system we have implemented

    Text to 3D Scene Generation with Rich Lexical Grounding

    Full text link
    The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics. However, prior work on the text to 3D scene generation task has used manually specified object categories and language that identifies them. We introduce a dataset of 3D scenes annotated with natural language descriptions and learn from this data how to ground textual descriptions to physical objects. Our method successfully grounds a variety of lexical terms to concrete referents, and we show quantitatively that our method improves 3D scene generation over previous work using purely rule-based methods. We evaluate the fidelity and plausibility of 3D scenes generated with our grounding approach through human judgments. To ease evaluation on this task, we also introduce an automated metric that strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201

    Resolving Perception Based Problems in Human-Computer Dialogue

    Get PDF
    We investigate the effect of sensor errors on situated human­ computer dialogues. If a human user instructs a robot to perform a task in a spatial environment, errors in the robot\u27s sensor based perception of the environment may result in divergences between the user\u27s and the robot\u27s understanding of the environment. If the user and the robot communicate through a language based interface, these problems may result in complex misunderstand­ ings. In this work we investigate such situations. We set up a simulation based scenario in which a human user instructs a robot to perform a series of manipulation tasks, such as lifting, moving and re-arranging simple objects. We induce errors into the robot\u27s perception, such as misclassification of shapes and colours, and record and analyse the user\u27s attempts to resolve the problems. We evaluate a set of methods to alleviate the problems by allowing the operator to access the robot\u27s understanding of the scene. We investigate a uni-directional language based option, which is based on automatically generated scene descriptions, a visually based option, in which the system highlights objects and provides known properties, and a dialogue based assistance option. In this option the participant can a.sk simple questions about the robot\u27s perception of the scene. As a baseline condition we perform the experiment without introducing any errors. We evaluate and compare the success and problems in all four conditions. We identify and compare strategies the participants used in each condition. We find that the participants appreciate and use the information request options successfully. We find that that all options provide an improvement over the condition without information. We conclude that allowing the participants to access information about the robot\u27s perception state is an effective way to resolve problems in the dialogue
    corecore