Search CORE

66 research outputs found

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Author: Feng Siyuan
Scharenborg Odette
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2020
Field of study

This study addresses unsupervised subword modeling, i.e., learning feature representations that can distinguish subword units of a language. The proposed approach adopts a two-stage bottleneck feature (BNF) learning framework, consisting of autoregressive predictive coding (APC) as a front-end and a DNN-BNF model as a back-end. APC pretrained features are set as input features to a DNN-BNF model. A language-mismatched ASR system is used to provide cross-lingual phone labels for DNN-BNF model training. Finally, BNFs are extracted as the subword-discriminative feature representation. A second aim of this work is to investigate the robustness of our approach's effectiveness to different amounts of training data. The results on Libri-light and the ZeroSpeech 2017 databases show that APC is effective in front-end feature pretraining. Our whole system outperforms the state of the art on both databases. Cross-lingual phone labels for English data by a Dutch ASR outperform those by a Mandarin ASR, possibly linked to the larger similarity of Dutch compared to Mandarin with English. Our system is less sensitive to training data amount when the training data is over 50 hours. APC pretraining leads to a reduction of needed training material from over 5,000 hours to around 200 hours with little performance degradation.Comment: 5 pages, 3 figures. Accepted for publication in INTERSPEECH 2020, Shanghai, Chin

arXiv.org e-Print Archive

Crossref

TU Delft Repository

The Effects of Background Noise on Native and Non-native Spoken-word Recognition: A Computational Modelling Approach

Author: Karaminis Themis
Scharenborg Odette
Publication venue
Publication date: 01/01/2018
Field of study

How does the presence of background noise affect thecognitive processes underlying spoken-word recognition? Andhow do these effects differ in native and non-native languagelisteners? We addressed these questions using artificial neural-network modelling. We trained a deep auto-encoderarchitecture on binary phonological and semanticrepresentations of 121 English and Dutch translationequivalents. We also varied exposure to the two languages togenerate ‘native English’ and ‘non-native English’ trainednetworks. These networks captured key effects in theperformance (accuracy rates and the number of erroneousresponses per word stimulus) of English and Dutch listeners inan offline English spoken-word identification experiment(Scharenborg et al., 2017), which considered clean and noisylistening conditions and three intensities of speech-shapednoise, applied word-initially or word-finally. Our simulationssuggested that the effects of noise on native and non-nativelistening are comparable and can be accounted for within thesame cognitive architecture for spoken-word recognition

Edge Hill University Research Information Repository

eScholarship - University of California

Recommended from our members

The Presence of Background Noise Extends the Competitor Space in Native and Non‐Native Spoken‐Word Recognition: Insights from Computational Modeling

Author: Hintz Florian
Karaminis Themis
Scharenborg Odette
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

Oral communication often takes place in noisy environments, which challenge spoken-word recognition. Previous research has suggested that the presence of background noise extends the number of candidate words competing with the target word for recognition and that this extension affects the time course and accuracy of spoken-word recognition. In this study, we further investigated the temporal dynamics of competition processes in the presence of background noise, and how these vary in listeners with different language proficiency (i.e., native and non-native) using computational modeling. We developed ListenIN (Listen-In-Noise), a neural-network model based on an autoencoder architecture, which learns to map phonological forms onto meanings in two languages and simulates native and non-native spoken-word comprehension. We also examined the model's activation states during online spoken-word recognition. These analyses demonstrated that the presence of background noise increases the number of competitor words, which are engaged in phonological competition and that this happens in similar ways intra and interlinguistically and in native and non-native listening. Taken together, our results support accounts positing a “many-additional-competitors scenario” for the effects of noise on spoken-word recognition.Multimedia Computin

City Research Online

TU Delft Repository

Edge Hill University Research Information Repository

PubMed Central

MPG.PuRe

Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation

Author: Lin Zhaofeng
Patel Tanvina
Scharenborg Odette
Publication venue
Publication date: 09/11/2023
Field of study

Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To address the data scarcity issue, we use a signal processing-based technique that transforms the spectral characteristics of normal speech to those of pseudo-whispered speech. We augment an End-to-End ASR with pseudo-whispered speech and achieve an 18.2% relative reduction in word error rate for whispered speech compared to the baseline. Results for the individual speaker groups in the wTIMIT database show the best results for US English. Further investigation showed that the lack of glottal information in whispered speech has the largest impact on whispered speech ASR performance.Comment: Accepted to ASRU 202

arXiv.org e-Print Archive

Cross-linguistic Influences on Sentence Accent Detection in Background Noise.

Author: Kakouros Sofoklis
Meunier Fanny
Post Brechtje
Scharenborg Odette
Publication venue: Lang Speech
Publication date: 01/01/2019
Field of study

This paper investigates whether sentence accent detection in a non-native language is dependent on (relative) similarity between prosodic cues to accent between the non-native and the native language, and whether cross-linguistic differences in the use of local and more widely distributed (i.e., non-local) cues to sentence accent detection lead to differential effects of the presence of background noise on sentence accent detection in a non-native language. We compared Dutch, Finnish, and French non-native listeners of English, whose cueing and use of prosodic prominence is gradually further removed from English, and compared their results on a phoneme monitoring task in different levels of noise and a quiet condition to those of native listeners. Overall phoneme detection performance was high for the native and the non-native listeners, but deteriorated to the same extent in the presence of background noise. Crucially, relative similarity between the prosodic cues to sentence accent of one's native language compared to that of a non-native language does not determine the ability to perceive and use sentence accent for speech perception in that non-native language. Moreover, proficiency in the non-native language is not a straightforward predictor of sentence accent perception performance, although high proficiency in a non-native language can seemingly overcome certain differences at the prosodic level between the native and non-native language. Instead, performance is determined by the extent to which listeners rely on local cues (English and Dutch) versus cues that are more distributed (Finnish and French), as more distributed cues survive the presence of background noise better

Aaltodoc Publication Archive

Radboud Repository

Apollo (Cambridge)

Bayesian Models for Unit Discovery on a Very Low Resource Language

Author: Besacier Laurent
Burget Lukas
Dupoux Emmanuel
Godard Pierre
Hasegawa-Johnson Mark
Khudanpur Sanjeev
Larsen Elin
Ondel Lucas
Scharenborg Odette
Yvon François
Publication venue
Publication date: 20/02/2018
Field of study

Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show that Bayesian models can naturally integrate information from other resourceful languages by means of informative prior leading to more consistent discovered units. Finally, discovered acoustic units are used, either as the 1-best sequence or as a lattice, to perform word segmentation. Word segmentation results show that this Bayesian approach clearly outperforms a Segmental-DTW baseline on the same corpus.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server