Search CORE

18,051 research outputs found

Non-native children speech recognition through transfer learning

Author: Falavigna Daniele
Giuliani Diego
Gretter Roberto
Matassoni Marco
Publication venue
Publication date: 01/01/2018
Field of study

This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language. The application scenario is characterized by young students learning English and German and reading sentences in these second-languages, as well as in their mother language. The paper analyzes and discusses techniques for training effective DNN-based acoustic models starting from children native speech and performing adaptation with limited non-native audio material. A multi-lingual model is adopted as baseline, where a common phonetic lexicon, defined in terms of the units of the International Phonetic Alphabet (IPA), is shared across the three languages at hand (Italian, German and English); DNN adaptation methods based on transfer learning are evaluated on significant non-native evaluation sets. Results show that the resulting non-native models allow a significant improvement with respect to a mono-lingual system adapted to speakers of the target language

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Leveraging native language information for improved accented speech recognition

Author: Ghorbani Shahram
Hansen John H. L.
Publication venue: 'International Speech Communication Association'
Publication date: 18/04/2019
Field of study

Recognition of accented speech is a long-standing challenge for automatic speech recognition (ASR) systems, given the increasing worldwide population of bi-lingual speakers with English as their second language. If we consider foreign-accented speech as an interpolation of the native language (L1) and English (L2), using a model that can simultaneously address both languages would perform better at the acoustic level for accented speech. In this study, we explore how an end-to-end recurrent neural network (RNN) trained system with English and native languages (Spanish and Indian languages) could leverage data of native languages to improve performance for accented English speech. To this end, we examine pre-training with native languages, as well as multi-task learning (MTL) in which the main task is trained with native English and the secondary task is trained with Spanish or Indian Languages. We show that the proposed MTL model performs better than the pre-training approach and outperforms a baseline model trained simply with English data. We suggest a new setting for MTL in which the secondary task is trained with both English and the native language, using the same output set. This proposed scenario yields better performance with +11.95% and +17.55% character error rate gains over baseline for Hispanic and Indian accents, respectively.Comment: Accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

Pronunciation variation modelling using accent features

Author: Huckvale M
Tjalve M
Publication venue
Publication date: 01/01/2005
Field of study

UCL Discovery

Agreeing to disagree : constant non-alignment of speech gestures in dialogue

Author: Mitterer Holger
Publication venue: Malta Chamber of Scientists
Publication date: 01/01/2014
Field of study

Numerous studies suggest that interlocutors in a dialogue align with each other in terms of their articulatory gestures. It is often suggested that this, first, is the consequence of an automatic tendency for imitation and, second, this fosters mutual understanding. Making use of online archives of media, it was tested whether alignment is hence inevitable. The focus was on the pronunciation of the German word. The standard pronunciation is, but speakers with a Swabian accent produce, acoustically reflected in the fricative spectra. We measured the spectra of fricatives in from interviewers while interviewing either a prominent German politician using the Swabian variant or an interviewee using the standard variant. Results showed neither an overall influence of the interviewees' pronunciation on the fricative realization by the interviewer nor a tendency to align over time for interviewer-interviewee pairs with different pronunciations. This shows that phonetic alignment in conversation is a more complex process than most current theories seem to suggest. Moreover, failure to align may not impede mutual understanding.peer-reviewe

OAR@UM

Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition

Author: Hong Kook Kim
Mina Kim
Yoo Rhee Oh
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Non-native contrasts in Tongan loans

Author: Alderete
Broselow
Broselow
Broselow
Burzio
Churchward
Churchward
Davidson
Davidson
Davidson
Dupoux
Garellek
Goldwater
Hayes
Hayes
Johnson
Jäger
Kaeli Ward
Kathleen Chase O'Flynn
Kawahara
Kenstowicz
Kenstowicz
Kenstowicz
Kie Zuraw
Kinkade
LaCharité
Lal
Lewis
Martin
Mascaró
McCarthy
Munro
Pater
Pater
Peperkamp
Peterson
Poser
Prince
Round
Schütz
Schütz
Smith
Smith
Snijders
Steriade
Taumoefolau
Välimaa-Blum
Zuraw
Árnason
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

We present three case studies of marginal contrasts in Tongan loans from English, working with data from three speakers. Although Tongan lacks contrasts in stress or in CC vs. CVC sequences, secondary stress in loans is contrastive, and is sensitive to whether a vowel has a correspondent in the English source word; vowel deletion is also sensitive to whether a vowel is epenthetic as compared to the English source; and final vowel length is sensitive to whether the penultimate vowel is epenthetic, and if not, whether it corresponds to a stressed or unstressed vowel in the English source. We provide an analysis in the multilevel model of Boersma (1998) and Boersma & Hamann (2009), and show that the loan patterns can be captured using only constraints that plausibly are needed for native-word phonology, including constraints that reflect perceptual strategies

Crossref

eScholarship - University of California