76 research outputs found

    MultiModal semantic representation

    Get PDF

    Multimodal Fusion as Communicative Acts during Human-Robot Interaction

    Get PDF
    Research on dialog systems is a very active area in social robotics. During the last two decades, these systems have evolved from those based only on speech recognition and synthesis to the current and modern systems, which include new components and multimodality. By multimodal dialogue we mean the interchange of information among several interlocutors, not just using their voice as the mean of transmission but also all the available channels such as gestures, facial expressions, touch, sounds, etc. These channels add information to the message to be transmitted in every dialogue turn. The dialogue manager (IDiM) is one of the components of the robotic dialog system (RDS) and is in charge of managing the dialogue flow during the conversational turns. In order to do that, it is necessary to coherently treat the inputs and outputs of information that flow by different communication channels: audio, vision, radio frequency, touch, etc. In our approach, this multichannel input of information is temporarily fused into communicative acts (CAs). Each CA groups the information that flows through the different input channels into the same pack, transmitting a unique message or global idea. Therefore, this temporary fusion of information allows the IDiM to abstract from the channels used during the interaction, focusing only on the message, not on the way it is transmitted. This article presents the whole RDS and the description of how the multimodal fusion of information is made as CAs. Finally, several scenarios where the multimodal dialogue is used are presented.Comunidad de Madri

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation präsentiert eine multimodale Architektur zum Gebrauch in mobilen Umständen wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen Umständen analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte Vorführungsprogramme zum \u27stand-alone\u27; Gebrauch auf mobilen Geräten entworfen. Von spezieller Wichtigkeit war der Entwurf und die Ausführung eines Modalitäts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die Ausführung erlaubt die Veränderung von Zuverlässigkeitswerten innerhalb einzelner Modalitäten und außerdem ermöglicht eine Methode um die semantisch überlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische Multimodalität sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben über die bevorzugte Art verschiedener Modalitätskombinationen, sowie auch über die Akzeptanz von anthropomorphisierten Objekten

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation präsentiert eine multimodale Architektur zum Gebrauch in mobilen Umständen wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen Umständen analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte Vorführungsprogramme zum 'stand-alone'; Gebrauch auf mobilen Geräten entworfen. Von spezieller Wichtigkeit war der Entwurf und die Ausführung eines Modalitäts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die Ausführung erlaubt die Veränderung von Zuverlässigkeitswerten innerhalb einzelner Modalitäten und außerdem ermöglicht eine Methode um die semantisch überlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische Multimodalität sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben über die bevorzugte Art verschiedener Modalitätskombinationen, sowie auch über die Akzeptanz von anthropomorphisierten Objekten

    A flexible and reusable framework for dialogue and action management in multi-party discourse

    Get PDF
    This thesis describes a model for goal-directed dialogue and activity control in real-time for multiple conversation participants that can be human users or virtual characters in multimodal dialogue systems and a framework implementing the model. It is concerned with two genres: task-oriented systems and interactive narratives. The model is based on a representation of participant behavior on three hierarchical levels: dialogue acts, dialogue games, and activities. Dialogue games allow to take advantage of social conventions and obligations to model the basic structure of dialogues. The interactions can be specified and implemented using reoccurring elementary building blocks. Expectations about future behavior of other participants are derived from the state of active dialogue games; this can be useful for, e. g., input disambiguation. The knowledge base of the system is defined in an ontological format and allows individual knowledge and personal traits for the characters. The Conversational Behavior Generation Framework implements the model. It coordinates a set of conversational dialogue engines (CDEs), where each participant is represented by one CDE. The virtual characters can act autonomously, or semi-autonomously follow goals assigned by an external story module (Narrative Mode). The framework allows combining alternative specification methods for the virtual characters\u27; activities (implementation in a general-purpose programming language, by plan operators, or in the specification language Lisa that was developed for the model). The practical viability of the framework was tested and demonstrated via the realization of three systems with different purposes and scope.Diese Arbeit beschreibt ein Modell für zielgesteuerte Dialog- und Ablaufsteuerung in Echtzeit für beliebig viele menschliche Konversationsteilnehmer und virtuelle Charaktere in multimodalen Dialogsystemen, sowie eine Softwareumgebung, die das Modell implementiert. Dabei werden zwei Genres betrachtet: Task-orientierte Systeme und interaktive Erzählungen. Das Modell basiert auf einer Repräsentation des Teilnehmerverhaltens auf drei hierarchischen Ebenen: Dialogakte, Dialogspiele und Aktivitäten. Dialogspiele erlauben es, soziale Konventionen und Obligationen auszunutzen, um die Dialoge grundlegend zu strukturieren. Die Interaktionen können unter Verwendung wiederkehrender elementarer Bausteine spezifiziert und programmtechnisch implementiert werden. Aus dem Zustand aktiver Dialogspiele werden Erwartungen an das zukünftige Verhalten der Dialogpartner abgeleitet, die beispielsweise für die Desambiguierung von Eingaben von Nutzen sein können. Die Wissensbasis des Systems ist in einem ontologischen Format definiert und ermöglicht individuelles Wissen und persönliche Merkmale für die Charaktere. Das Conversational Behavior Generation Framework implementiert das Modell. Es koordiniert eine Menge von Dialog-Engines (CDEs), wobei jedem Teilnehmer eine CDE zugeordet wird, die ihn repräsentiert. Die virtuellen Charaktere können autonom oder semi-autonom nach den Zielvorgaben eines externen Storymoduls agieren (Narrative Mode). Das Framework erlaubt die Kombination alternativer Spezifikationsarten für die Aktivitäten der virtuellen Charaktere (Implementierung in einer allgemeinen Programmiersprache, durch Planoperatoren oder in der für das Modell entwickelten Spezifikationssprache Lisa). Die Praxistauglichkeit des Frameworks wurde anhand der Realisierung dreier Systeme mit unterschiedlichen Zielsetzungen und Umfang erprobt und erwiesen

    Developing Intelligent MultiMedia applications

    Get PDF

    Multimodal agents for cooperative interaction

    Get PDF
    2020 Fall.Includes bibliographical references.Embodied virtual agents offer the potential to interact with a computer in a more natural manner, similar to how we interact with other people. To reach this potential requires multimodal interaction, including both speech and gesture. This project builds on earlier work at Colorado State University and Brandeis University on just such a multimodal system, referred to as Diana. I designed and developed a new software architecture to directly address some of the difficulties of the earlier system, particularly with regard to asynchronous communication, e.g., interrupting the agent after it has begun to act. Various other enhancements were made to the agent systems, including the model itself, as well as speech recognition, speech synthesis, motor control, and gaze control. Further refactoring and new code were developed to achieve software engineering goals that are not outwardly visible, but no less important: decoupling, testability, improved networking, and independence from a particular agent model. This work, combined with the effort of others in the lab, has produced a "version 2'' Diana system that is well positioned to serve the lab's research needs in the future. In addition, in order to pursue new research opportunities related to developmental and intervention science, a "Faelyn Fox'' agent was developed. This is a different model, with a simplified cognitive architecture, and a system for defining an experimental protocol (for example, a toy-sorting task) based on Unity's visual state machine editor. This version too lays a solid foundation for future research

    Emotion recognition from speech: An implementation in MATLAB

    Get PDF
    Capstone Project submitted to the Department of Engineering, Ashesi University in partial fulfillment of the requirements for the award of Bachelor of Science degree in Electrical and Electronic Engineering, April 2019Human Computer Interaction now focuses more on being able to relate to human emotions. Recognizing human emotions from speech is an area that a lot of research is being done into with the rise of robots and Virtual reality. In this paper, emotion recognition from speech is done in MATLAB. Feature extraction is done based on the pitch and 13 MFCCs of the audio files. Two classification methods are used and compared to determine the one with the highest accuracy for the data set.Ashesi Universit
    corecore