9,558 research outputs found
The Role of Speaker Identification in Taiwanese Attitudes Towards Varieties of English
No abstract available
Leveraging native language information for improved accented speech recognition
Recognition of accented speech is a long-standing challenge for automatic
speech recognition (ASR) systems, given the increasing worldwide population of
bi-lingual speakers with English as their second language. If we consider
foreign-accented speech as an interpolation of the native language (L1) and
English (L2), using a model that can simultaneously address both languages
would perform better at the acoustic level for accented speech. In this study,
we explore how an end-to-end recurrent neural network (RNN) trained system with
English and native languages (Spanish and Indian languages) could leverage data
of native languages to improve performance for accented English speech. To this
end, we examine pre-training with native languages, as well as multi-task
learning (MTL) in which the main task is trained with native English and the
secondary task is trained with Spanish or Indian Languages. We show that the
proposed MTL model performs better than the pre-training approach and
outperforms a baseline model trained simply with English data. We suggest a new
setting for MTL in which the secondary task is trained with both English and
the native language, using the same output set. This proposed scenario yields
better performance with +11.95% and +17.55% character error rate gains over
baseline for Hispanic and Indian accents, respectively.Comment: Accepted at Interspeech 201
Recommended from our members
Rhythm in the speech of a person with right hemisphere damage: Applying the pairwise variability index
Although several aspects of prosody have been studied in speakers with right hemisphere damage (RHD), rhythm remains largely uninvestigated. This study compares the rhythm of an Australian English speaker with right hemisphere damage (due to a stroke, but with no concomitant dysarthria) to that of a neurologically unimpaired individual. The speakers' rhythm is compared using the pairwise variability index (PVI) which allows for an acoustic characterization of rhythm by comparing the duration of successive vocalic and intervocalic intervals. A sample of speech from a structured interview between a speech and language therapist and each participant was analysed. Previous research has shown that speakers with RHD may have difficulties with intonation production, and therefore it was hypothesized that there may also be rhythmic disturbance. Results show that the neurologically normal control uses a similar rhythm to that reported for British English (there are no previous studies available for Australian English), whilst the speaker with RHD produces speech with a less strongly stress-timed rhythm. This finding was statistically significant for the intervocalic intervals measured (t(8) = 4.7, p < .01), and suggests that some aspects of prosody may be right lateralized for this speaker. The findings are discussed in relation to previous findings of dysprosody in RHD populations, and in relation to syllable-timed speech of people with other neurological conditions
Speaker and accent variation are handled differently : evidence in native and non-native listeners
Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries.Publisher PDFPeer reviewe
Rhythm and Vowel Quality in Accents of English
In a sample of 27 speakers of Scottish Standard English two notoriously variable consonantal features are investigated: the contrast of /m/ and /w/ and non-prevocalic /r/, the latter both in terms of its presence or absence and the phonetic form it takes, if present. The pattern of realisation of non-prevocalic /r/ largely confirms previously reported findings. But there are a number of surprising results regarding the merger of /m/ and /w/ and the loss of non-prevocalic /r/: While the former is more likely to happen in younger speakers and females, the latter seems more likely in older speakers and males. This is suggestive of change in progress leading to a loss of the /m/ - /w/ contrast, while the variation found in non-prevocalic /r/ follows an almost inverse sociolinguistic pattern that does not suggest any such change and is additionally largely explicable in language-internal terms. One phenomenon requiring further investigation is the curious effect direct contact with Southern English accents seems to have on non-prevocalic /r/: innovation on the structural level (i.e. loss) and conservatism on the realisational level (i.e. increased incidence of [r] and [r]) appear to be conditioned by the same sociolinguistic factors
- …