39,330 research outputs found

    Multimodal Speech Emotion Recognition Using Audio and Text

    Full text link
    Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201

    Copyright protection for the electronic distribution of text documents

    Get PDF
    Each copy of a text document can be made different in a nearly invisible way by repositioning or modifying the appearance of different elements of text, i.e., lines, words, or characters. A unique copy can be registered with its recipient, so that subsequent unauthorized copies that are retrieved can be traced back to the original owner. In this paper we describe and compare several mechanisms for marking documents and several other mechanisms for decoding the marks after documents have been subjected to common types of distortion. The marks are intended to protect documents of limited value that are owned by individuals who would rather possess a legal than an illegal copy if they can be distinguished. We will describe attacks that remove the marks and countermeasures to those attacks. An architecture is described for distributing a large number of copies without burdening the publisher with creating and transmitting the unique documents. The architecture also allows the publisher to determine the identity of a recipient who has illegally redistributed the document, without compromising the privacy of individuals who are not operating illegally. Two experimental systems are described. One was used to distribute an issue of the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, and the second was used to mark copies of company private memoranda

    Alarm initiated activities: Matching formats to tasks

    Get PDF
    This paper addresses the selection of visual alarm formats for different 'alarm initiated activities'. The activities under examination were alarm handling tasks. Seven such tasks have been identified, namely: observe, accept, analyse, investigate, correct, monitor and reset. One of the most important stages is the initial analysis of the alarm information as this determines the subsequent manner in which the information is processed. It was hypothesised that the format in which the information is presented will determine the success of the alarm handling task, hence the proposal to match formats to tasks. The findings suggest that text-based formats are best suited to tasks requiring time-based reasoning, mimic formats are best suited to tasks requiring spatial location and annunciator formats are best suited to tasks requiring recognition of spatial patterns. The importance of considering both reaction time and accuracy of response in consideration of task match was also noted. In summary, it is suggested that care needs to be taken to determine the appropriateness of the medium for any given task and the demands it places on the human operator
    corecore