Search CORE

5,279 research outputs found

Talker identification is not improved by lexical access in the absence of familiar phonology

Author: McLaughlin Deirdre
Publication venue
Publication date: 06/06/2017
Field of study

Listeners identify talkers more accurately when they are familiar with both the sounds and words of the language being spoken. It is unknown whether lexical information alone can facilitate talker identification in the absence of familiar phonology. To dissociate the roles of familiar words and phonology, we developed English-Mandarin “hybrid” sentences, spoken in Mandarin, which can be convincingly coerced to sound like English when presented with corresponding subtitles (e.g., “wei4 gou3 chi1 kao3 li2 zhi1” becomes “we go to college”). Across two experiments, listeners learned to identify talkers in three conditions: listeners' native language (English), an unfamiliar, foreign language (Mandarin), and a foreign language paired with subtitles that primed native language lexical access (subtitled Mandarin). In Experiment 1 listeners underwent a single session of talker identity training; in Experiment 2 listeners completed three days of training. Talkers in a foreign language were identified no better when native language lexical representations were primed (subtitled Mandarin) than from foreign-language speech alone, regardless of whether they had received one or three days of talker identity training. These results suggest that the facilitatory effect of lexical access on talker identification depends on the availability of familiar phonological forms

Boston University Institutional Repository (OpenBU)

Computational Approaches to Exploring Persian-Accented English

Author: Miller Corey
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/03/2015
Field of study

Methods involving phonetic speech recognition are discussed for detecting Persian-accented English. These methods offer promise for both the identification and mitigation of L2 pronunciation errors. Pronunciation errors, both segmental and suprasegmental, particular to Persian speakers of English are discussed

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Author: Chan William
Hori Takaaki
Watanabe Shinji
Zhang Yu
Publication venue
Publication date: 08/06/2017
Field of study

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10\% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.Comment: Accepted for INTERSPEECH 201

arXiv.org e-Print Archive

Crossref