591 research outputs found
A criterial interlocutor tally for successful talker adaptation?
Part of the remarkable efficiency of listening is accommodation to unfamiliar talkers’ specific pronunciations by retuning of phonemic intercategory boundaries. Such retuning occurs in second (L2) as well as first language (L1); however, recent research with emigrés revealed successful adaptation in the environmental L2 but, unprecedentedly, not in L1 despite continuing L1 use. A possible explanation involving relative exposure to novel talkers is here tested in heritage language users with Mandarin as family L1 and English as environmental language. In English, exposure to an ambiguous sound in disambiguating word contexts prompted the expected adjustment of phonemic boundaries in subsequent categorisation. However, no adjustment occurred in Mandarin, again despite regular use. Participants reported highly asymmetric interlocutor counts in the two languages. We conclude that successful retuning ability requires regular exposure to novel talkers in the language in question, a criterion not met for the emigrés’ or for these heritage users’ L1
No L1 privilege in talker adaptation
As a rule, listening is easier in first (L1) than second languages (L2); difficult L2 listening can challenge even highly proficient users. We here examine one particular listening function, adaptation to novel talkers, in such a high-proficiency population: Dutch emigrants to Australia, predominantly using English outside the family, but all also retaining L1 proficiency. Using lexically-guided perceptual learning (Norris, McQueen & Cutler, 2003), we investigated these listeners’ adaptation to an ambiguous speech sound, in parallel experiments in both their L1 and their L2. A control study established that perceptual learning outcomes were unaffected by the procedural measures required for this double comparison. The emigrants showed equivalent proficiency in tests in both languages, robust perceptual adaptation in their L2, English, but no adaptation in L1. We propose that adaptation to novel talkers is a language-specific skill requiring regular novel practice; a limited set of known (family) interlocutors cannot meet this requirement
Auditory perceptual learning in autistic adults
The automatic retuning of phoneme categories to better adapt to the speech of a novel talker has been extensively documented across various (neurotypical) populations, including both adults and children. However, no studies have examined auditory perceptual learning effects in populations atypical in perceptual, social, and language processing for communication, such as populations with autism. Employing a classic lexically-guided perceptual learning paradigm, the present study investigated perceptual learning effects in Australian English autistic and non-autistic adults. The findings revealed that automatic attunement to existing phoneme categories was not activated in the autistic group in the same manner as for non-autistic control subjects. Specifically, autistic adults were able to both successfully discern lexical items and to categorize speech sounds; however, they did not show effects of perceptual retuning to talkers. These findings may have implications for the application of current sensory theories (e.g., Bayesian decision theory) to speech and language processing by autistic individuals. Lay Summary Lexically guided perceptual learning assists in the disambiguation of speech from a novel talker. The present study established that while Australian English autistic adult listeners were able to successfully discern lexical items and categorize speech sounds in their native language, perceptual flexibility in updating speaker-specific phonemic knowledge when exposed to a novel talker was not available. Implications for speech and language processing by autistic individuals as well as current sensory theories are discussed
The Relationship Between Phonemic Category Boundary Changes and Perceptual Adjustments to Natural Accents
published Online First October 21, 2019People often experience difficulties when they first hear a novel accent. Prior research has shown that
relatively fast natural accent accommodation can occur. However, there has been little investigation of
the underlying perceptual mechanism that drives the learning. The current study examines whether
phonemic boundary changes play a central role in natural accent accommodation. Two well-established
boundary shifting phenomena were used here—recalibration and selective adaptation—to index the
flexibility of phonemic category boundaries. Natural accent accommodation was measured with a task in
which listeners heard accented words and nonwords before and after listening to English sentences
produced by one of two native Mandarin Chinese speakers with moderate accents. In two experiments,
participants completed recalibration, selective adaptation, and natural accent accommodation tasks
focusing on a consonant contrast that is difficult for native Chinese speakers to produce. We found that:
(a) On the accent accommodation task, participants showed an increased endorsement of accented/
mispronounced words after exposure to a speaker’s accented speech, indicating a potential relaxation of
criteria in the word recognition process; (b) There was no strong link between recalibrating phonemic
boundaries and natural accent accommodation; (c) There was no significant correlation between recalibration
and selective adaptation. These results suggest that recalibration of phonemic boundaries does not
play a central role in natural accent accommodation. Instead, there is some evidence suggesting that
natural accent accommodation involves a relaxation of phonemic categorization criteria.Support was provided by Ministerio de Ciencia E Innovacion, Grant
PSI2017-82563-P, Centro de Excelencia Severo Ochoa, Grant SEV-2015-
0490, by the Basque Government through the BERC 2018–2021 program,
and by the National Science Foundation under Grant IBSS-1519908
Communicative focus on form and second language suprasegmental learning: teaching Cantonese learners to perceive mandarin tones
The current study examined how form-focused instruction (FFI) with and without corrective feedback (CF) as output enhancement can facilitate L2 perception of Mandarin tones at both the phonetic and phonological levels in 41 Cantonese learners of Mandarin. Two experimental groups, FFI-only and FFI-CF, received a 90-minute FFI treatment designed to encourage them to notice and practice the categorical distinctions of Mandarin tones through a range of communicative input and output activities. During these activities, the instructors provided CF only to students in the FFI-CF group by recasting and pushing them to repair their mispronunciations of the target features (i.e., output enhancement). The control group received comparable meaning-oriented instruction without any FFI. The effectiveness of FFI was assessed via a forced-choice identification task with both trained and untrained items for a variety of tonal contrasts in Mandarin (high level Tone 1 vs. mid-rising Tone 2 vs. high falling Tone 4). According to statistical comparisons, the FFI-only group attained significant improvement in all lexical and tonal contexts, and such effectiveness was evident particularly in the acquisition of Tone 1 and Tone 4—supposedly the most difficult instances due to their identical phonological status in the learners’ L1, Cantonese. The FFI-CF group, however, demonstrated marginally significant gains only under the trained lexical conditions. The results in turn suggest that FFI promotes learners’ attentional shift from vocabulary to sound learning (generalizable gains in trained and untrained items) and facilitates their access to new phonetic and phonological categories. Yet, the relative advantage of adding CF to FFI as output enhancement remains unclear, especially with respect to the less experienced L2 learners in the current study
Modeling DNN as human learner
In previous experiments, human listeners demonstrated that they had the ability to adapt to
unheard, ambiguous phonemes after some initial, relatively short exposures. At the same time,
previous work in the speech community has shown that pre-trained deep neural network-based
(DNN) ASR systems, like humans, also have the ability to adapt to unseen, ambiguous phonemes
after retuning their parameters on a relatively small set. In the first part of this thesis, the time-course
of phoneme category adaptation in a DNN is investigated in more detail. By retuning the
DNNs with more and more tokens with ambiguous sounds and comparing classification accuracy
of the ambiguous phonemes in a held-out test across the time-course, we found out that DNNs, like
human listeners, also demonstrated fast adaptation: the accuracy curves were step-like in almost
all cases, showing very little adaptation after seeing only one (out of ten) training bins. However,
unlike our experimental setup mentioned above, in a typical
lexically guided perceptual learning
experiment, listeners are trained with individual words instead of individual phones, and thus to truly
model such a scenario, we would require a model that could take the context of a whole utterance
into account. Traditional speech recognition systems accomplish this through the use of hidden
Markov models (HMM) and WFST decoding. In recent years, bidirectional long short-term memory (Bi-LSTM) trained under connectionist temporal classification (CTC) criterion has also attracted
much attention. In the second part of this thesis, previous experiments on ambiguous phoneme
recognition were carried out again on a new Bi-LSTM model, and phonetic transcriptions of words
ending with ambiguous phonemes were used as training targets, instead of individual sounds that
consisted of a single phoneme. We found out that despite the vastly different architecture, the
new model showed highly similar behavior in terms of classification rate over the time course of
incremental retuning. This indicated that ambiguous phonemes in a continuous context could also
be quickly adapted by neural network-based models. In the last part of this thesis, our pre-trained
Dutch Bi-LSTM from the previous part was treated as a Dutch second language learner and was
asked to transcribe English utterances in a self-adaptation scheme. In other words, we used the
Dutch model to generate phonetic transcriptions directly and retune the model on the transcriptions
it generated, although ground truth transcriptions were used to choose a subset of all self-labeled
transcriptions. Self-adaptation is of interest as a model of human second language learning, but also
has great practical engineering value, e.g., it could be used to adapt speech recognition to a lowr-resource
language. We investigated two ways to improve the adaptation scheme, with the first being
multi-task learning with articulatory feature detection during training the model on Dutch and self-labeled
adaptation, and the second being first letting the model adapt to isolated short words before
feeding it with longer utterances.Ope
Phonological abstraction in processing lexical-tone variation: Evidence from a learning paradigm
There is a growing consensus that the mental lexicon contains both abstract and word-specific acoustic information. To investigate their relative importance for word recognition, we tested to what extent perceptual learning is word specific or generalizable to other words. In an exposure phase, participants were divided into two groups; each group was semantically biased to interpret an ambiguous Mandarin tone contour as either tone1 or tone2. In a subsequent test phase, the perception of ambiguous contours was dependent on the exposure phase: Participants who heard ambiguous contours as tone1 during exposure were more likely to perceive ambiguous contours as tone1 than participants who heard ambiguous contours as tone2 during exposure. This learning effect was only slightly larger for previously encountered than for not previously encountered words. The results speak for an architecture with prelexical analysis of phonological categories to achieve both lexical access and episodic storage of exemplars
Recommended from our members
L2 Perception and Production of Japanese Lexical Pitch: A Suprasegmental Similarity Account
Adults are known to have difficulties acquiring suprasegmental speech that involves pitch (f0) in a second language (L2) (Graham & Post, 2018; Hirata, 2015; Wang, Spence, Jongman & Sereno, 1999; Wong & Perrachione, 2007). Previous research has suggested that the perceived similarity between L1 and L2 phonology may influence how easily segmental speech is acquired, but this notion of ‘similarity’ may also apply to suprasegmental speech (So & Best, 2010; Wu, Munro & Wang, 2014). In this paper, the L2 acquisition of Japanese lexical pitch was assessed under a ‘Suprasegmental Similarity Account’, which is a theoretical framework inspired by previous models of segmental and suprasegmental speech (Best & Tyler, 2007; Flege, 1995; Mennen, 2015) to account for the L2 acquisition of word prosody. Eight adult native speakers of Japanese and eight adult English-native advanced learners of Japanese participated in a perception and production study of Japanese lexical pitch patterns. Both groups performed similarly in perception, but non-native speakers performed significantly worse in production, particularly for ‘unaccented’ Low–High–High patterns. These findings are discussed in light of the ‘Suprasegmental Similarity Account’.</jats:p
- …