11,888 research outputs found
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
Deep learning with convolutional neural networks for decoding and visualization of EEG pathology
We apply convolutional neural networks (ConvNets) to the task of
distinguishing pathological from normal EEG recordings in the Temple University
Hospital EEG Abnormal Corpus. We use two basic, shallow and deep ConvNet
architectures recently shown to decode task-related information from EEG at
least as well as established algorithms designed for this purpose. In decoding
EEG pathology, both ConvNets reached substantially better accuracies (about 6%
better, ~85% vs. ~79%) than the only published result for this dataset, and
were still better when using only 1 minute of each recording for training and
only six seconds of each recording for testing. We used automated methods to
optimize architectural hyperparameters and found intriguingly different ConvNet
architectures, e.g., with max pooling as the only nonlinearity. Visualizations
of the ConvNet decoding behavior showed that they used spectral power changes
in the delta (0-4 Hz) and theta (4-8 Hz) frequency range, possibly alongside
other features, consistent with expectations derived from spectral analysis of
the EEG data and from the textual medical reports. Analysis of the textual
medical reports also highlighted the potential for accuracy increases by
integrating contextual information, such as the age of subjects. In summary,
the ConvNets and visualization techniques used in this study constitute a next
step towards clinically useful automated EEG diagnosis and establish a new
baseline for future work on this topic.Comment: Published at IEEE SPMB 2017 https://www.ieeespmb.org/2017
- …