2,043 research outputs found
Leveraging native language information for improved accented speech recognition
Recognition of accented speech is a long-standing challenge for automatic
speech recognition (ASR) systems, given the increasing worldwide population of
bi-lingual speakers with English as their second language. If we consider
foreign-accented speech as an interpolation of the native language (L1) and
English (L2), using a model that can simultaneously address both languages
would perform better at the acoustic level for accented speech. In this study,
we explore how an end-to-end recurrent neural network (RNN) trained system with
English and native languages (Spanish and Indian languages) could leverage data
of native languages to improve performance for accented English speech. To this
end, we examine pre-training with native languages, as well as multi-task
learning (MTL) in which the main task is trained with native English and the
secondary task is trained with Spanish or Indian Languages. We show that the
proposed MTL model performs better than the pre-training approach and
outperforms a baseline model trained simply with English data. We suggest a new
setting for MTL in which the secondary task is trained with both English and
the native language, using the same output set. This proposed scenario yields
better performance with +11.95% and +17.55% character error rate gains over
baseline for Hispanic and Indian accents, respectively.Comment: Accepted at Interspeech 201
The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals
This dissertation explores the potential benefit of listening to and with one’s first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listener’s second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals.
The first experiment investigated whether evoking the listeners’ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent.
The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listener’s RT adaptation across the first experimental block.
A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals.
These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB.
Keywords:
Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficienc
Listening to Accented Speech in a Second Language: First Language and Age of Acquisition Effects
Online First March 10, 2016.Bilingual speakers must acquire the phonemic inventory of 2 languages and need to recognize spoken
words cross-linguistically; a demanding job potentially made even more difficult due to dialectal
variation, an intrinsic property of speech. The present work examines how bilinguals perceive second
language (L2) accented speech and where accommodation to dialectal variation takes place. Dialectal
effects were analyzed at different levels: An AXB discrimination task tapped phonetic-phonological
representations, an auditory lexical-decision task tested for effects in accessing the lexicon, and an
auditory priming task looked for semantic processing effects. Within that central focus, the goal was to
see whether perceptual adjustment at a given level is affected by 2 main linguistic factors: bilinguals’ first
language and age of acquisition of the L2. Taking advantage of the cross-linguistic situation of the
Basque language, bilinguals with different first languages (Spanish or French) and ages of acquisition of
Basque (simultaneous, early, or late) were tested. Our use of multiple tasks with multiple types of
bilinguals demonstrates that in spite of very similar discrimination capacity, French-Basque versus
Spanish-Basque simultaneous bilinguals’ performance on lexical access significantly differed. Similarly,
results of the early and late groups show that the mapping of phonetic-phonological information onto
lexical representations is a more demanding process that accentuates non-native processing difficulties.
L1 and AoA effects were more readily overcome in semantic processing; accented variants regularly
created priming effects in the different groups of bilinguals.This study was conducted with the support of the PSI 2010–17781 Grant
to the second author from the Spanish Government (MINECO)
Co-activation in the bilingual lexicon: Evidence from Chinese-English bilinguals
Investigation of the bilingual mental lexicon suggests that one of its defining characteristics is integration. Words across both languages are subject to parallel co-activation during language processing. An auditory stimulus typing task was used to assess connectivity on the basis of both morphology and phonology. English loanwords in Chinese and transparent English noun-noun compounds with Chinese translation equivalents with corresponding compound structure (corresponding compounds) were used as the critical stimuli. Accent was also manipulated to determine whether or not phonological cues may influence the degree of cross-linguistic co-activation. Results suggest cross-linguistic co-activation on the basis of phonological overlap in different script bilinguals but only weakly supported morphological integration in Chinese-English bilinguals. Accent led to greater co-activation of phonologically similar loanword pairs. Results are discussed in terms of inhibitory control, language acquisition, and the structure of the bilingual lexicon
Modelo acústico de lÃngua inglesa falada por portugueses
Trabalho de projecto de mestrado em Engenharia Informática, apresentado à Universidade de Lisboa, através da Faculdade de Ciências, 2007No contexto do reconhecimento robusto de fala baseado em modelos de Markov não observáveis (do inglês Hidden Markov Models - HMMs) este trabalho descreve algumas metodologias e experiências tendo em vista o reconhecimento de oradores estrangeiros. Quando falamos em Reconhecimento de Fala falamos obrigatoriamente em Modelos Acústicos também. Os modelos acústicos reflectem a maneira como pronunciamos/articulamos uma lÃngua, modelando a sequência de sons emitidos aquando da fala. Essa modelação assenta em segmentos de fala mÃnimos, os fones, para os quais existe um conjunto de sÃmbolos/alfabetos que representam a sua pronunciação. É no campo da fonética articulatória e acústica que se estuda a representação desses sÃmbolos, sua articulação e pronunciação. Conseguimos descrever palavras analisando as unidades que as constituem, os fones. Um reconhecedor de fala interpreta o sinal de entrada, a fala, como uma sequência de sÃmbolos codificados. Para isso, o sinal é fragmentado em observações de sensivelmente 10 milissegundos cada, reduzindo assim o factor de análise ao intervalo de tempo onde as caracterÃsticas de um segmento de som não variam. Os modelos acústicos dão-nos uma noção sobre a probabilidade de uma determinada observação corresponder a uma determinada entidade. É, portanto, através de modelos sobre as entidades do vocabulário a reconhecer que é possÃvel voltar a juntar esses fragmentos de som. Os modelos desenvolvidos neste trabalho são baseados em HMMs. Chamam-se assim por se fundamentarem nas cadeias de Markov (1856 - 1922): sequências de estados onde cada estado é condicionado pelo seu anterior. Localizando esta abordagem no nosso domÃnio, há que construir um conjunto de modelos - um para cada classe de sons a reconhecer - que serão treinados por dados de treino. Os dados são ficheiros áudio e respectivas transcrições (ao nÃvel da palavra) de modo a que seja possÃvel decompor essa transcrição em fones e alinhá-la a cada som do ficheiro áudio correspondente. Usando um modelo de estados, onde cada estado representa uma observação ou segmento de fala descrita, os dados vão-se reagrupando de maneira a criar modelos estatÃsticos, cada vez mais fidedignos, que consistam em representações das entidades da fala de uma determinada lÃngua. O reconhecimento por parte de oradores estrangeiros com pronuncias diferentes da lÃngua para qual o reconhecedor foi concebido, pode ser um grande problema para precisão de um reconhecedor. Esta variação pode ser ainda mais problemática que a variação dialectal de uma determinada lÃngua, isto porque depende do conhecimento que cada orador têm relativamente à lÃngua estrangeira. Usando para uma pequena quantidade áudio de oradores estrangeiros para o treino de novos modelos acústicos, foram efectuadas diversas experiências usando corpora de Portugueses a falar Inglês, de Português Europeu e de Inglês. Inicialmente foi explorado o comportamento, separadamente, dos modelos de Ingleses nativos e Portugueses nativos, quando testados com os corpora de teste (teste com nativos e teste com não nativos). De seguida foi treinado um outro modelo usando em simultâneo como corpus de treino, o áudio de Portugueses a falar Inglês e o de Ingleses nativos. Uma outra experiência levada a cabo teve em conta o uso de técnicas de adaptação, tal como a técnica MLLR, do inglês Maximum Likelihood Linear Regression. Esta última permite a adaptação de uma determinada caracterÃstica do orador, neste caso o sotaque estrangeiro, a um determinado modelo inicial. Com uma pequena quantidade de dados representando a caracterÃstica que se quer modelar, esta técnica calcula um conjunto de transformações que serão aplicadas ao modelo que se quer adaptar. Foi também explorado o campo da modelação fonética onde estudou-se como é que o orador estrangeiro pronuncia a lÃngua estrangeira, neste caso um Português a falar Inglês. Este estudo foi feito com a ajuda de um linguista, o qual definiu um conjunto de fones, resultado do mapeamento do inventário de fones do Inglês para o Português, que representam o Inglês falado por Portugueses de um determinado grupo de prestÃgio. Dada a grande variabilidade de pronúncias teve de se definir este grupo tendo em conta o nÃvel de literacia dos oradores. Este estudo foi posteriormente usado na criação de um novo modelo treinado com os corpora de Portugueses a falar Inglês e de Portugueses nativos. Desta forma representamos um reconhecedor de Português nativo onde o reconhecimento de termos ingleses é possÃvel. Tendo em conta a temática do reconhecimento de fala este projecto focou também a recolha de corpora para português europeu e a compilação de um léxico de Português europeu. Na área de aquisição de corpora o autor esteve envolvido na extracção e preparação dos dados de fala telefónica, para posterior treino de novos modelos acústicos de português europeu. Para compilação do léxico de português europeu usou-se um método incremental semi-automático. Este método consistiu em gerar automaticamente a pronunciação de grupos de 10 mil palavras, sendo cada grupo revisto e corrigido por um linguista. Cada grupo de palavras revistas era posteriormente usado para melhorar as regras de geração automática de pronunciações.The tremendous growth of technology has increased the need of integration of spoken language technologies into our daily applications, providing an easy and natural access to information. These applications are of different nature with different user’s interfaces. Besides voice enabled Internet portals or tourist information systems, automatic speech recognition systems can be used in home user’s experiences where TV and other appliances could be voice controlled, discarding keyboards or mouse interfaces, or in mobile phones and palm-sized computers for a hands-free and eyes-free manipulation. The development of these systems causes several known difficulties. One of them concerns the recognizer accuracy on dealing with non-native speakers with different phonetic pronunciations of a given language. The non-native accent can be more problematic than a dialect variation on the language. This mismatch depends on the individual speaking proficiency and speaker’s mother tongue. Consequently, when the speaker’s native language is not the same as the one that was used to train the recognizer, there is a considerable loss in recognition performance. In this thesis, we examine the problem of non-native speech in a speaker-independent and large-vocabulary recognizer in which a small amount of non-native data was used for training. Several experiments were performed using Hidden Markov models, trained with speech corpora containing European Portuguese native speakers, English native speakers and English spoken by European Portuguese native speakers. Initially it was explored the behaviour of an English native model and non-native English speakers’ model. Then using different corpus weights for the English native speakers and English spoken by Portuguese speakers it was trained a model as a pool of accents. Through adaptation techniques it was used the Maximum Likelihood Linear Regression method. It was also explored how European Portuguese speakers pronounce English language studying the correspondences between the phone sets of the foreign and target languages. The result was a new phone set, consequence of the mapping between the English and the Portuguese phone sets. Then a new model was trained with English Spoken by Portuguese speakers’ data and Portuguese native data. Concerning the speech recognition subject this work has other two purposes: collecting Portuguese corpora and supporting the compilation of a Portuguese lexicon, adopting some methods and algorithms to generate automatic phonetic pronunciations. The collected corpora was processed in order to train acoustic models to be used in the Exchange 2007 domain, namely in Outlook Voice Access
Loan Phonology
For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native language’s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena
Marginal contrast in loanword phonology:Production and perception
Though Dutch is usually described as lacking a voicing contrast at the velar place of articulation, due to intense language contact and heavy lexical borrowing, a contrast between /k/ and /g/ has recently been emerging. We explored the status of this contrast in Dutch speakers in both production and perception. We asked participants to produce loanwords containing a /g/ in the source language (e.g., goal) and found a range of productions, including a great many unadapted [g] tokens. We also tested the same speakers on their perception of the emerging [k] ~ [g] contrast and found that our participants were able to discriminate the emerging contrast well. We additionally explored the possibility that those speakers who use the new contrast more in production are also better at perceiving it, but we did not observe strong evidence of such a link. Overall, our results indicate that the adoption of the new sound is well advanced in the population we tested, but is still modulated by individual-level factors. We hold that contrasts emerging through borrowing, like other phonological contrasts, are subject to perceptual and functional constraints, and that these and other ‘marginal contrasts’ must be considered as full-fledged parts of phonology
- …