Search CORE

2,043 research outputs found

Leveraging native language information for improved accented speech recognition

Author: Ghorbani Shahram
Hansen John H. L.
Publication venue: 'International Speech Communication Association'
Publication date: 18/04/2019
Field of study

Recognition of accented speech is a long-standing challenge for automatic speech recognition (ASR) systems, given the increasing worldwide population of bi-lingual speakers with English as their second language. If we consider foreign-accented speech as an interpolation of the native language (L1) and English (L2), using a model that can simultaneously address both languages would perform better at the acoustic level for accented speech. In this study, we explore how an end-to-end recurrent neural network (RNN) trained system with English and native languages (Spanish and Indian languages) could leverage data of native languages to improve performance for accented English speech. To this end, we examine pre-training with native languages, as well as multi-task learning (MTL) in which the main task is trained with native English and the secondary task is trained with Spanish or Indian Languages. We show that the proposed MTL model performs better than the pre-training approach and outperforms a baseline model trained simply with English data. We suggest a new setting for MTL in which the secondary task is trained with both English and the native language, using the same output set. This proposed scenario yields better performance with +11.95% and +17.55% character error rate gains over baseline for Hispanic and Indian accents, respectively.Comment: Accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals

Author: Dokovova Marie
Publication venue: Queen Margaret University, Edinburgh
Publication date: 01/01/2019
Field of study

This dissertation explores the potential benefit of listening to and with one’s first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listener’s second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals. The first experiment investigated whether evoking the listeners’ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent. The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listener’s RT adaptation across the first experimental block. A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals. These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB. Keywords: Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficienc

Queen Margaret University eResearch

Challenges with Rapid Adaptation of Speech Translation Systems to New Language Pairs

Author: Black Alan W.
Schultz Tanja
Publication venue
Publication date: 18/06/2008
Field of study

KITopen

Listening to Accented Speech in a Second Language: First Language and Age of Acquisition Effects

Author: Larraza Arnanz Saioa
Oñederra Olaizola Miren Lourdes
Samuel Arthur G.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2016
Field of study

Online First March 10, 2016.Bilingual speakers must acquire the phonemic inventory of 2 languages and need to recognize spoken words cross-linguistically; a demanding job potentially made even more difficult due to dialectal variation, an intrinsic property of speech. The present work examines how bilinguals perceive second language (L2) accented speech and where accommodation to dialectal variation takes place. Dialectal effects were analyzed at different levels: An AXB discrimination task tapped phonetic-phonological representations, an auditory lexical-decision task tested for effects in accessing the lexicon, and an auditory priming task looked for semantic processing effects. Within that central focus, the goal was to see whether perceptual adjustment at a given level is affected by 2 main linguistic factors: bilinguals’ first language and age of acquisition of the L2. Taking advantage of the cross-linguistic situation of the Basque language, bilinguals with different first languages (Spanish or French) and ages of acquisition of Basque (simultaneous, early, or late) were tested. Our use of multiple tasks with multiple types of bilinguals demonstrates that in spite of very similar discrimination capacity, French-Basque versus Spanish-Basque simultaneous bilinguals’ performance on lexical access significantly differed. Similarly, results of the early and late groups show that the mapping of phonetic-phonological information onto lexical representations is a more demanding process that accentuates non-native processing difficulties. L1 and AoA effects were more readily overcome in semantic processing; accented variants regularly created priming effects in the different groups of bilinguals.This study was conducted with the support of the PSI 2010–17781 Grant to the second author from the Spanish Government (MINECO)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Co-activation in the bilingual lexicon: Evidence from Chinese-English bilinguals

Author: Gallant Jordan
Publication venue: 'Brock University Library'
Publication date: 09/03/2020
Field of study

Investigation of the bilingual mental lexicon suggests that one of its defining characteristics is integration. Words across both languages are subject to parallel co-activation during language processing. An auditory stimulus typing task was used to assess connectivity on the basis of both morphology and phonology. English loanwords in Chinese and transparent English noun-noun compounds with Chinese translation equivalents with corresponding compound structure (corresponding compounds) were used as the critical stimuli. Accent was also manipulated to determine whether or not phonological cues may influence the degree of cross-linguistic co-activation. Results suggest cross-linguistic co-activation on the basis of phonological overlap in different script bilinguals but only weakly supported morphological integration in Chinese-English bilinguals. Accent led to greater co-activation of phonologically similar loanword pairs. Results are discussed in terms of inhibitory control, language acquisition, and the structure of the bilingual lexicon

Brock University Digital Repository

Modelo acústico de língua inglesa falada por portugueses

Author: Simões Carla Alexandra Coelho
Publication venue
Publication date: 01/01/2007
Field of study

Trabalho de projecto de mestrado em Engenharia Informática, apresentado à Universidade de Lisboa, através da Faculdade de Ciências, 2007No contexto do reconhecimento robusto de fala baseado em modelos de Markov não observáveis (do inglês Hidden Markov Models - HMMs) este trabalho descreve algumas metodologias e experiências tendo em vista o reconhecimento de oradores estrangeiros. Quando falamos em Reconhecimento de Fala falamos obrigatoriamente em Modelos Acústicos também. Os modelos acústicos reflectem a maneira como pronunciamos/articulamos uma língua, modelando a sequência de sons emitidos aquando da fala. Essa modelação assenta em segmentos de fala mínimos, os fones, para os quais existe um conjunto de símbolos/alfabetos que representam a sua pronunciação. É no campo da fonética articulatória e acústica que se estuda a representação desses símbolos, sua articulação e pronunciação. Conseguimos descrever palavras analisando as unidades que as constituem, os fones. Um reconhecedor de fala interpreta o sinal de entrada, a fala, como uma sequência de símbolos codificados. Para isso, o sinal é fragmentado em observações de sensivelmente 10 milissegundos cada, reduzindo assim o factor de análise ao intervalo de tempo onde as características de um segmento de som não variam. Os modelos acústicos dão-nos uma noção sobre a probabilidade de uma determinada observação corresponder a uma determinada entidade. É, portanto, através de modelos sobre as entidades do vocabulário a reconhecer que é possível voltar a juntar esses fragmentos de som. Os modelos desenvolvidos neste trabalho são baseados em HMMs. Chamam-se assim por se fundamentarem nas cadeias de Markov (1856 - 1922): sequências de estados onde cada estado é condicionado pelo seu anterior. Localizando esta abordagem no nosso domínio, há que construir um conjunto de modelos - um para cada classe de sons a reconhecer - que serão treinados por dados de treino. Os dados são ficheiros áudio e respectivas transcrições (ao nível da palavra) de modo a que seja possível decompor essa transcrição em fones e alinhá-la a cada som do ficheiro áudio correspondente. Usando um modelo de estados, onde cada estado representa uma observação ou segmento de fala descrita, os dados vão-se reagrupando de maneira a criar modelos estatísticos, cada vez mais fidedignos, que consistam em representações das entidades da fala de uma determinada língua. O reconhecimento por parte de oradores estrangeiros com pronuncias diferentes da língua para qual o reconhecedor foi concebido, pode ser um grande problema para precisão de um reconhecedor. Esta variação pode ser ainda mais problemática que a variação dialectal de uma determinada língua, isto porque depende do conhecimento que cada orador têm relativamente à língua estrangeira. Usando para uma pequena quantidade áudio de oradores estrangeiros para o treino de novos modelos acústicos, foram efectuadas diversas experiências usando corpora de Portugueses a falar Inglês, de Português Europeu e de Inglês. Inicialmente foi explorado o comportamento, separadamente, dos modelos de Ingleses nativos e Portugueses nativos, quando testados com os corpora de teste (teste com nativos e teste com não nativos). De seguida foi treinado um outro modelo usando em simultâneo como corpus de treino, o áudio de Portugueses a falar Inglês e o de Ingleses nativos. Uma outra experiência levada a cabo teve em conta o uso de técnicas de adaptação, tal como a técnica MLLR, do inglês Maximum Likelihood Linear Regression. Esta última permite a adaptação de uma determinada característica do orador, neste caso o sotaque estrangeiro, a um determinado modelo inicial. Com uma pequena quantidade de dados representando a característica que se quer modelar, esta técnica calcula um conjunto de transformações que serão aplicadas ao modelo que se quer adaptar. Foi também explorado o campo da modelação fonética onde estudou-se como é que o orador estrangeiro pronuncia a língua estrangeira, neste caso um Português a falar Inglês. Este estudo foi feito com a ajuda de um linguista, o qual definiu um conjunto de fones, resultado do mapeamento do inventário de fones do Inglês para o Português, que representam o Inglês falado por Portugueses de um determinado grupo de prestígio. Dada a grande variabilidade de pronúncias teve de se definir este grupo tendo em conta o nível de literacia dos oradores. Este estudo foi posteriormente usado na criação de um novo modelo treinado com os corpora de Portugueses a falar Inglês e de Portugueses nativos. Desta forma representamos um reconhecedor de Português nativo onde o reconhecimento de termos ingleses é possível. Tendo em conta a temática do reconhecimento de fala este projecto focou também a recolha de corpora para português europeu e a compilação de um léxico de Português europeu. Na área de aquisição de corpora o autor esteve envolvido na extracção e preparação dos dados de fala telefónica, para posterior treino de novos modelos acústicos de português europeu. Para compilação do léxico de português europeu usou-se um método incremental semi-automático. Este método consistiu em gerar automaticamente a pronunciação de grupos de 10 mil palavras, sendo cada grupo revisto e corrigido por um linguista. Cada grupo de palavras revistas era posteriormente usado para melhorar as regras de geração automática de pronunciações.The tremendous growth of technology has increased the need of integration of spoken language technologies into our daily applications, providing an easy and natural access to information. These applications are of different nature with different user’s interfaces. Besides voice enabled Internet portals or tourist information systems, automatic speech recognition systems can be used in home user’s experiences where TV and other appliances could be voice controlled, discarding keyboards or mouse interfaces, or in mobile phones and palm-sized computers for a hands-free and eyes-free manipulation. The development of these systems causes several known difficulties. One of them concerns the recognizer accuracy on dealing with non-native speakers with different phonetic pronunciations of a given language. The non-native accent can be more problematic than a dialect variation on the language. This mismatch depends on the individual speaking proficiency and speaker’s mother tongue. Consequently, when the speaker’s native language is not the same as the one that was used to train the recognizer, there is a considerable loss in recognition performance. In this thesis, we examine the problem of non-native speech in a speaker-independent and large-vocabulary recognizer in which a small amount of non-native data was used for training. Several experiments were performed using Hidden Markov models, trained with speech corpora containing European Portuguese native speakers, English native speakers and English spoken by European Portuguese native speakers. Initially it was explored the behaviour of an English native model and non-native English speakers’ model. Then using different corpus weights for the English native speakers and English spoken by Portuguese speakers it was trained a model as a pool of accents. Through adaptation techniques it was used the Maximum Likelihood Linear Regression method. It was also explored how European Portuguese speakers pronounce English language studying the correspondences between the phone sets of the foreign and target languages. The result was a new phone set, consequence of the mapping between the English and the Portuguese phone sets. Then a new model was trained with English Spoken by Portuguese speakers’ data and Portuguese native data. Concerning the speech recognition subject this work has other two purposes: collecting Portuguese corpora and supporting the compilation of a Portuguese lexicon, adopting some methods and algorithms to generate automatic phonetic pronunciations. The collected corpora was processed in order to train acoustic models to be used in the Exchange 2007 domain, namely in Outlook Voice Access

Universidade de Lisboa: Repositório.UL

Loan Phonology

Author
Publication venue: 'John Benjamins Publishing Company'
Publication date
Field of study

For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native language’s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena

OAPEN Library

Experience with foreign accent influences non-native (L2) word recognition: The case of th-substitutions [Abstract]

Author: Hanulikova A.
Weber A.
Publication venue
Publication date: 01/04/2009
Field of study

MPG.PuRe

Marginal contrast in loanword phonology:Production and perception

Author: Kager René
Martin Alexander
Peperkamp Sharon
van Heugten Marieke
Publication venue: 'Open Library of the Humanities'
Publication date: 01/01/2022
Field of study

Though Dutch is usually described as lacking a voicing contrast at the velar place of articulation, due to intense language contact and heavy lexical borrowing, a contrast between /k/ and /g/ has recently been emerging. We explored the status of this contrast in Dutch speakers in both production and perception. We asked participants to produce loanwords containing a /g/ in the source language (e.g., goal) and found a range of productions, including a great many unadapted [g] tokens. We also tested the same speakers on their perception of the emerging [k] ~ [g] contrast and found that our participants were able to discriminate the emerging contrast well. We additionally explored the possibility that those speakers who use the new contrast more in production are also better at perceiving it, but we did not observe strong evidence of such a link. Overall, our results indicate that the adoption of the new sound is well advanced in the population we tested, but is still modulated by individual-level factors. We hold that contrasts emerging through borrowing, like other phonological contrasts, are subject to perceptual and functional constraints, and that these and other ‘marginal contrasts’ must be considered as full-fledged parts of phonology

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Utrecht University Repository

Dissertations of the University of Groningen