16,726 research outputs found
Transferable Positive/Negative Speech Emotion Recognition via Class-wise Adversarial Domain Adaptation
Speech emotion recognition plays an important role in building more
intelligent and human-like agents. Due to the difficulty of collecting speech
emotional data, an increasingly popular solution is leveraging a related and
rich source corpus to help address the target corpus. However, domain shift
between the corpora poses a serious challenge, making domain shift adaptation
difficult to function even on the recognition of positive/negative emotions. In
this work, we propose class-wise adversarial domain adaptation to address this
challenge by reducing the shift for all classes between different corpora.
Experiments on the well-known corpora EMODB and Aibo demonstrate that our
method is effective even when only a very limited number of target labeled
examples are provided.Comment: 5 pages, 3 figures, accepted to ICASSP 201
Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data
Audio Word2Vec offers vector representations of fixed dimensionality for
variable-length audio segments using Sequence-to-sequence Autoencoder (SA).
These vector representations are shown to describe the sequential phonetic
structures of the audio segments to a good degree, with real world applications
such as query-by-example Spoken Term Detection (STD). This paper examines the
capability of language transfer of Audio Word2Vec. We train SA from one
language (source language) and use it to extract the vector representation of
the audio segments of another language (target language). We found that SA can
still catch phonetic structure from the audio segments of the target language
if the source and target languages are similar. In query-by-example STD, we
obtain the vector representations from the SA learned from a large amount of
source language data, and found them surpass the representations from naive
encoder and SA directly learned from a small amount of target language data.
The result shows that it is possible to learn Audio Word2Vec model from
high-resource languages and use it on low-resource languages. This further
expands the usability of Audio Word2Vec.Comment: arXiv admin note: text overlap with arXiv:1603.0098
Transfer Learning for Personality Perception via Speech Emotion Recognition
Holistic perception of affective attributes is an important human perceptual
ability. However, this ability is far from being realized in current affective
computing, as not all of the attributes are well studied and their
interrelationships are poorly understood. In this work, we investigate the
relationship between two affective attributes: personality and emotion, from a
transfer learning perspective. Specifically, we transfer Transformer-based and
wav2vec2-based emotion recognition models to perceive personality from speech
across corpora. Compared with previous studies, our results show that
transferring emotion recognition is effective for personality perception.
Moreoever, this allows for better use and exploration of small personality
corpora. We also provide novel findings on the relationship between personality
and emotion that will aid future research on holistic affect recognition.Comment: Accepted to INTERSPEECH 202
Enhancing transferability of black-box adversarial attacks via lifelong learning for speech emotion recognition models
A paper in INTERSPEECH 2020
- …