22,307 research outputs found
Semi-supervised cross-lingual speech emotion recognition
Speech emotion recognition (SER) on a single language has achieved remarkable
results through deep learning approaches over the last decade. However,
cross-lingual SER remains a challenge in real-world applications due to (i) a
large difference between the source and target domain distributions, (ii) the
availability of few labeled and many unlabeled utterances for the new language.
Taking into account previous aspects, we propose a Semi-Supervised Learning
(SSL) method for cross-lingual emotion recognition when a few labels from the
new language are available. Based on a Convolutional Neural Network (CNN), our
method adapts to a new language by exploiting a pseudo-labeling strategy for
the unlabeled utterances. In particular, the use of a hard and soft
pseudo-labels approach is investigated. We thoroughly evaluate the performance
of the method in a speaker-independent setup on both the source and the new
language and show its robustness across five languages belonging to different
linguistic strains
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
- …