598 research outputs found
Survey of deep representation learning for speech emotion recognition
Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual eort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated \textit{deep representation learning} where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER
A new Stack Autoencoder: Neighbouring Sample Envelope Embedded Stack Autoencoder Ensemble Model
Stack autoencoder (SAE), as a representative deep network, has unique and
excellent performance in feature learning, and has received extensive attention
from researchers. However, existing deep SAEs focus on original samples without
considering the hierarchical structural information between samples. To address
this limitation, this paper proposes a new SAE model-neighbouring envelope
embedded stack autoencoder ensemble (NE_ESAE). Firstly, the neighbouring sample
envelope learning mechanism (NSELM) is proposed for preprocessing of input of
SAE. NSELM constructs sample pairs by combining neighbouring samples. Besides,
the NSELM constructs a multilayer sample spaces by multilayer iterative mean
clustering, which considers the similar samples and generates layers of
envelope samples with hierarchical structural information. Second, an embedded
stack autoencoder (ESAE) is proposed and trained in each layer of sample space
to consider the original samples during training and in the network structure,
thereby better finding the relationship between original feature samples and
deep feature samples. Third, feature reduction and base classifiers are
conducted on the layers of envelope samples respectively, and output
classification results of every layer of samples. Finally, the classification
results of the layers of envelope sample space are fused through the ensemble
mechanism. In the experimental section, the proposed algorithm is validated
with over ten representative public datasets. The results show that our method
significantly has better performance than existing traditional feature learning
methods and the representative deep autoencoders.Comment: 17 pages,6 figure
- …