67,060 research outputs found

    Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

    Full text link
    Speech emotion recognition is a challenging and important research topic that plays a critical role in human-computer interaction. Multimodal inputs can improve the performance as more emotional information is used for recognition. However, existing studies learnt all the information in the sample while only a small portion of it is about emotion. Moreover, under the multimodal framework, the interaction between different modalities is shallow and insufficient. In this paper, a keysparse Transformer is proposed for efficient SER by only focusing on emotion related information. Furthermore, a cascaded cross-attention block, which is specially designed for multimodal framework, is introduced to achieve deep interaction between different modalities. The proposed method is evaluated by IEMOCAP corpus and the experimental results show that the proposed method gives better performance than the state-of-theart approaches

    Evaluating Metaphor Reification in Tangible Interfaces

    Get PDF
    International audienceMetaphors are a powerful conceptual device to reason about human actions. As such, they have been heavily used in designing and describing human computer interaction. Since they can address scripted text, verbal expression, imaging, sound, and gestures, they can also be considered in the design and analysis of multimodal interfaces. In this paper we discuss the description and evaluation of the relations between metaphors and their implementation in human computer interaction with a focus on tangible user interfaces (TUIs), a form of multimodal interface. The objective of this paper is to define how metaphors appear in a tangible context in order to support their evaluation. Relying on matching entities and operations between the domain of interaction and the domain of the digital application, we propose a conceptual framework based on three components: a structured representation of the mappings holding between the metaphor source, the metaphor target, the interface and the digital system; a conceptual model for describing metaphorical TUIs; three relevant properties, coherence, coverage and compliance, which define at what extent the implementation of a metaphorical tangible interface matches the metaphor. The conceptual framework is then validated and applied on a tangible prototype in an educational application

    Vocabularies for description of accessibility issues in multimodal user interfaces

    Get PDF
    In previous work, we proposed a unified approach for describing multimodal human-computer interaction and interaction constraints in terms of sensual, motor, perceptual and cognitive functions of users. In this paper, we extend this work by providing formalised vocabularies that express human functionalities and anatomical structures required by specific modalities. The central theme of our approach is to connect these modality representations with descriptions of user, device and environmental constraints that influence the interaction. These descriptions can then be used in a reasoning framework that will exploit formal connections among interaction modalities and constraints. The focus of this paper is on specifying a comprehensive vocabulary of necessary concepts. Within the context of an interaction framework, we describe a number of examples that use this formalised knowledge

    Vocabularies for description of accessibility issues in multimodal user interfaces

    Get PDF
    In previous work, we proposed a unified approach for describing multimodal human-computer interaction and interaction constraints in terms of sensual, motor, perceptual and cognitive functions of users. In this paper, we extend this work by providing formalised vocabularies that express human functionalities and anatomical structures required by specific modalities. The central theme of our approach is to connect these modality representations with descriptions of user, device and environmental constraints that influence the interaction. These descriptions can then be used in a reasoning framework that will exploit formal connections among interaction modalities and constraints. The focus of this paper is on specifying a comprehensive vocabulary of necessary concepts. Within the context of an interaction framework, we describe a number of examples that use this formalised knowledge

    A multimodal dataset for authoring and editing multimedia content:the MAMEM project

    Get PDF
    We present a dataset that combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate) signals collected from 34 individuals (18 able-bodied and 16 motor-impaired). Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. The presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.</p

    Bimodal Emotion Classification Using Deep Learning

    Get PDF
    Multimodal Emotion Recognition is an emerging associative field in the area of Human Computer Interaction and Sentiment Analysis. It extracts information from each modality to predict the emotions accurately. In this research, Bimodal Emotion Recognition framework is developed with the decision-level fusion of Audio and Video modality using RAVDES dataset. Designing such frameworks are computationally expensive and require more time to train the network. Thus, a relatively small dataset has been used for the scope of this research. The conducted research is inspired by the use of neural networks for emotion classification from multimodal data. The developed framework further confirmed the fact that merging modality can enhance the accuracy in classifying emotions. Later, decision-level fusion is further explored with changes in the architecture of the Unimodal networks. The research showed that the Bimodal framework formed with the fusion of unimodal networks having wide layer with more nodes outperformed the framework designed with the fusion of narrow unimodal networks having lesser nodes

    Optimizing The Design Of Multimodal User Interfaces

    Get PDF
    Due to a current lack of principle-driven multimodal user interface design guidelines, designers may encounter difficulties when choosing the most appropriate display modality for given users or specific tasks (e.g., verbal versus spatial tasks). The development of multimodal display guidelines from both a user and task domain perspective is thus critical to the achievement of successful human-system interaction. Specifically, there is a need to determine how to design task information presentation (e.g., via which modalities) to capitalize on an individual operator\u27s information processing capabilities and the inherent efficiencies associated with redundant sensory information, thereby alleviating information overload. The present effort addresses this issue by proposing a theoretical framework (Architecture for Multi-Modal Optimization, AMMO) from which multimodal display design guidelines and adaptive automation strategies may be derived. The foundation of the proposed framework is based on extending, at a functional working memory (WM) level, existing information processing theories and models with the latest findings in cognitive psychology, neuroscience, and other allied sciences. The utility of AMMO lies in its ability to provide designers with strategies for directing system design, as well as dynamic adaptation strategies (i.e., multimodal mitigation strategies) in support of real-time operations. In an effort to validate specific components of AMMO, a subset of AMMO-derived multimodal design guidelines was evaluated with a simulated weapons control system multitasking environment. The results of this study demonstrated significant performance improvements in user response time and accuracy when multimodal display cues were used (i.e., auditory and tactile, individually and in combination) to augment the visual display of information, thereby distributing human information processing resources across multiple sensory and WM resources. These results provide initial empirical support for validation of the overall AMMO model and a sub-set of the principle-driven multimodal design guidelines derived from it. The empirically-validated multimodal design guidelines may be applicable to a wide range of information-intensive computer-based multitasking environments

    Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems

    Get PDF
    In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad
    • …
    corecore