33,026 research outputs found

    Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems

    Get PDF
    In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad

    A multi-agent architecture to combine heterogeneous inputs in multimodal interaction systems

    Get PDF
    Actas de: CAEPIA 2013, Congreso federado Agentes y Sistemas Multi-Agente: de la Teoría a la Práctica (ASMas). Madrid, 17-20 Septiembre 2013.In this paper we present a multi-agent architecture for the integration of visual sensor networks and speech-based interfaces. The proposed architecture combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the architecture integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, the proposed architecture incorporates enhanced conversational agents to facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows to model the user conversational behavior, which is learned from an initial corpus and posteriorly improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485).Publicad

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Towards responsive Sensitive Artificial Listeners

    Get PDF
    This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness
    • …
    corecore