13,204 research outputs found
Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition
Long short-term memory (LSTM) is normally used in recurrent neural network
(RNN) as basic recurrent unit. However,conventional LSTM assumes that the state
at current time step depends on previous time step. This assumption constraints
the time dependency modeling capability. In this study, we propose a new
variation of LSTM, advanced LSTM (A-LSTM), for better temporal context
modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The
A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based
weighted pooling RNN can also complement the state-of-the-art emotion
classification framework. This shows the advantage of A-LSTM
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
Langer Modification, Quantization condition and Barrier Penetration in Quantum Mechanics
The WKB approximation plays an essential role in the development of quantum
mechanics and various important results have been obtained from it. In this
paper, we introduce another method, {\it the so-called uniform asymptotic
approximations}, which is an analytical approximation method to calculate the
wave functions of the Schr\"odinger-like equations, and is applicable to
various problems, including cases with poles (singularities) and multiple
turning points. An distinguished feature of the method is that in each order of
the approximations the upper bounds of the errors are given explicitly. By
properly choosing the freedom introduced in the method, the errors can be
minimized, which significantly improves the accuracy of the calculations. A
byproduct of the method is to provide a very clear explanation of the Langer
modification encountered in the studies of the hydrogen atom and harmonic
oscillator. To further test our method, we calculate (analytically) the wave
functions for several exactly solvable potentials of the Schr\"odinger
equation, and then obtain the transmission coefficients of particles over
potential barriers, as well as the quantization conditions for bound states. We
find that such obtained results agree with the exact ones extremely well.
Possible applications of the method to other fields are also discussed.Comment: revtex4-1, 1 figures, and 1 table. Published in Universe 6 (2020) 9
- …