3 research outputs found
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
State-of-the-art spoken language identification (LID) systems, which are
based on end-to-end deep neural networks, have shown remarkable success not
only in discriminating between distant languages but also between
closely-related languages or even different spoken varieties of the same
language. However, it is still unclear to what extent neural LID models
generalize to speech samples with different acoustic conditions due to domain
shift. In this paper, we present a set of experiments to investigate the impact
of domain mismatch on the performance of neural LID systems for a subset of six
Slavic languages across two domains (read speech and radio broadcast) and
examine two low-level signal descriptors (spectral and cepstral features) for
this task. Our experiments show that (1) out-of-domain speech samples severely
hinder the performance of neural LID models, and (2) while both spectral and
cepstral features show comparable performance within-domain, spectral features
show more robustness under domain mismatch. Moreover, we apply unsupervised
domain adaptation to minimize the discrepancy between the two domains in our
study. We achieve relative accuracy improvements that range from 9% to 77%
depending on the diversity of acoustic conditions in the source domain.Comment: To appear in INTERSPEECH 202
Computational modelling of segmental and prosodic levels of analysis for capturing variation across Arabic dialects
Dialect variation spans different linguistic levels of analysis. Two examples include the typical phonetic realisations produced and the typical range of intonational choices made by individuals belonging to a given dialect group. Taking the modelling principles of a specific automatic accent recognition system, the work here characterises and observes the variation that exists within these two specific levels of analysis among eight Arabic dialects. Using a method that has previously shown promising performance on English accent varieties, we first model the segmental level of analysis from recordings of Arabic speakers to capture the variation in the phonetic realisations of the vowels and consonants. In doing so, we show how powerful this model can be in distinguishing between Arabic dialects. This paper then shows how this modelling approach can be adapted to instead characterise prosodic variation among these same dialects from the same speech recordings. This allows us to inspect the relative power of the segmental and prosodic levels of analysis in separating the Arabic dialects. This work opens up the possibility of using these modelling frameworks to study the extent and nature of phonetic and prosodic variation across speech corpora