32,950 research outputs found
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
A LightGBM-Based EEG Analysis Method for Driver Mental States Classification
Fatigue driving can easily lead to road traffic accidents and bring great harm to individuals and families. Recently, electroencephalography-
(EEG-) based physiological and brain activities for fatigue detection have been increasingly investigated.
However, how to find an effective method or model to timely and efficiently detect the mental states of drivers still remains a
challenge. In this paper, we combine common spatial pattern (CSP) and propose a light-weighted classifier, LightFD, which is
based on gradient boosting framework for EEG mental states identification. ,e comparable results with traditional classifiers,
such as support vector machine (SVM), convolutional neural network (CNN), gated recurrent unit (GRU), and large margin
nearest neighbor (LMNN), show that the proposed model could achieve better classification performance, as well as the decision
efficiency. Furthermore, we also test and validate that LightFD has better transfer learning performance in EEG classification of
driver mental states. In summary, our proposed LightFD classifier has better performance in real-time EEG mental state
prediction, and it is expected to have broad application prospects in practical brain-computer interaction (BCI)
Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
In a conventional Speech emotion recognition (SER) task, a classifier for a
given language is trained on a pre-existing dataset for that same language.
However, where training data for a language does not exist, data from other
languages can be used instead. We experiment with cross-lingual and
multilingual SER, working with Amharic, English, German and URDU. For Amharic,
we use our own publicly-available Amharic Speech Emotion Dataset (ASED). For
English, German and Urdu we use the existing RAVDESS, EMO-DB and URDU datasets.
We followed previous research in mapping labels for all datasets to just two
classes, positive and negative. Thus we can compare performance on different
languages directly, and combine languages for training and testing. In
Experiment 1, monolingual SER trials were carried out using three classifiers,
AlexNet, VGGE (a proposed variant of VGG), and ResNet50. Results averaged for
the three models were very similar for ASED and RAVDESS, suggesting that
Amharic and English SER are equally difficult. Similarly, German SER is more
difficult, and Urdu SER is easier. In Experiment 2, we trained on one language
and tested on another, in both directions for each pair: AmharicGerman,
AmharicEnglish, and AmharicUrdu. Results with Amharic as target suggested
that using English or German as source will give the best result. In Experiment
3, we trained on several non-Amharic languages and then tested on Amharic. The
best accuracy obtained was several percent greater than the best accuracy in
Experiment 2, suggesting that a better result can be obtained when using two or
three non-Amharic languages for training than when using just one non-Amharic
language. Overall, the results suggest that cross-lingual and multilingual
training can be an effective strategy for training a SER classifier when
resources for a language are scarce.Comment: 16 pages, 9 tables, 5 figure
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
- …