4,649 research outputs found
Multimodal Speech Emotion Recognition Using Audio and Text
Speech emotion recognition is a challenging task, and extensive reliance has
been placed on models that use audio features in building well-performing
classifiers. In this paper, we propose a novel deep dual recurrent encoder
model that utilizes text data and audio signals simultaneously to obtain a
better understanding of speech data. As emotional dialogue is composed of sound
and spoken content, our model encodes the information from audio and text
sequences using dual recurrent neural networks (RNNs) and then combines the
information from these sources to predict the emotion class. This architecture
analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that
focus on audio features. Extensive experiments are conducted to investigate the
efficacy and properties of the proposed model. Our proposed model outperforms
previous state-of-the-art methods in assigning data to one of four emotion
categories (i.e., angry, happy, sad and neutral) when the model is applied to
the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
The alignment of heterogeneous sequential data (video to text) is an
important and challenging problem. Standard techniques for this task, including
Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from
inherent drawbacks. Mainly, the Markov assumption implies that, given the
immediate past, future alignment decisions are independent of further history.
The separation between similarity computation and alignment decision also
prevents end-to-end training. In this paper, we propose an end-to-end neural
architecture where alignment actions are implemented as moving data between
stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture
supports a large variety of alignment tasks, including one-to-one, one-to-many,
skipping unmatched elements, and (with extensions) non-monotonic alignment.
Extensive experiments on semi-synthetic and real datasets show that our
algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and
the supplemental materia
Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning
Although aviation accidents are rare, safety incidents occur more frequently
and require a careful analysis to detect and mitigate risks in a timely manner.
Analyzing safety incidents using operational data and producing event-based
explanations is invaluable to airline companies as well as to governing
organizations such as the Federal Aviation Administration (FAA) in the United
States. However, this task is challenging because of the complexity involved in
mining multi-dimensional heterogeneous time series data, the lack of
time-step-wise annotation of events in a flight, and the lack of scalable tools
to perform analysis over a large number of events. In this work, we propose a
precursor mining algorithm that identifies events in the multidimensional time
series that are correlated with the safety incident. Precursors are valuable to
systems health and safety monitoring and in explaining and forecasting safety
incidents. Current methods suffer from poor scalability to high dimensional
time series data and are inefficient in capturing temporal behavior. We propose
an approach by combining multiple-instance learning (MIL) and deep recurrent
neural networks (DRNN) to take advantage of MIL's ability to learn using weakly
supervised data and DRNN's ability to model temporal behavior. We describe the
algorithm, the data, the intuition behind taking a MIL approach, and a
comparative analysis of the proposed algorithm with baseline models. We also
discuss the application to a real-world aviation safety problem using data from
a commercial airline company and discuss the model's abilities and
shortcomings, with some final remarks about possible deployment directions
- …