102 research outputs found

    Using probability distributions to account for recognition of canonical and reduced word forms

    Get PDF
    The frequency of a word form influences how efficiently it is processed, but canonical forms often show an advantage over reduced forms even when the reduced form is more frequent. This paper addresses this paradox by considering a model in which representations of lexical items consist of a distribution over forms. Optimal inference given these distributions accounts for item based differences in recognition of phonological variants and canonical form advantage

    High or low? Comparing high and low-variability phonetic training in adult and child second language learners

    Get PDF
    Background High talker variability (i.e., multiple voices in the input) has been found effective in training nonnative phonetic contrasts in adults. A small number of studies suggest that children also benefit from high-variability phonetic training with some evidence that they show greater learning (more plasticity) than adults given matched input, although results are mixed. However, no study has directly compared the effectiveness of high versus low talker variability in children. Methods Native Greek-speaking eight-year-olds (N = 52), and adults (N = 41) were exposed to the English /i/-/ÉȘ/ contrast in 10 training sessions through a computerized word-learning game. Pre- and post-training tests examined discrimination of the contrast as well as lexical learning. Participants were randomly assigned to high (four talkers) or low (one talker) variability training conditions. Results Both age groups improved during training, and both improved more while trained with a single talker. Results of a three-interval oddity discrimination test did not show the predicted benefit of high-variability training in either age group. Instead, children showed an effect in the reverse direction—i.e., reliably greater improvements in discrimination following single talker training, even for untrained generalization items, although the result is qualified by (accidental) differences between participant groups at pre-test. Adults showed a numeric advantage for high-variability but were inconsistent with respect to voice and word novelty. In addition, no effect of variability was found for lexical learning. There was no evidence of greater plasticity for phonetic learning in child learners. Discussion This paper adds to the handful of studies demonstrating that, like adults, child learners can improve their discrimination of a phonetic contrast via computerized training. There was no evidence of a benefit of training with multiple talkers, either for discrimination or word learning. The results also do not support the findings of greater plasticity in child learners found in a previous paper (Giannakopoulou, Uther & Ylinen, 2013a). We discuss these results in terms of various differences between training and test tasks used in the current work compared with previous literature

    North American /l/ both darkens and lightens depending on morphological constituency and segmental context

    Get PDF
    It is uncontroversial that, in many varieties of English, the realization of /l/ varies depending on whether /l/ occurs word-initially or word-finally. The nature of this effect, however, remains controversial. Previous analyses alternately analyzed the variation as darkening or lightening, and alternately found evidence that the variation involves a categorical distinction between allophones or a gradient scale conditioned by phonetic factors. We argue that these diverging conclusions are a result of the numerous factors influencing /l/ darkness and differences between studies in terms of which factors are considered. By controlling for a range of factors, our study demonstrates a pattern of variability that has not been shown in previous work. We find evidence of morpheme-final darkening and morpheme-initial lightening when compared to a baseline of morpheme-internal /l/. We also find segmental effects such that, in segmental contexts which independently darken /l/, one can observe /l/ lightening, and contexts which independently lighten /l/ can make lightening effects undetectable. Morphological and prosodic effects are hence sometimes trumped by segmental context. Once contextual effects are controlled for, there is evidence both for morphologically-conditioned /l/-darkening and for morphologically-conditioned /l/-lightening, both of which can be understood as a result of prosodic differences reflecting morphological junctures

    Modelling Perceptual Effects of Phonology with ASR Systems

    Get PDF
    International audienceThis paper explores the minimal knowledge a listener needs to compensate for phonological assimilation, one kind of phonological process responsible for variation in speech. We used standard automatic speech recognition models to represent English and French listeners. We found that, first, some types of models show language-specific assimilation patterns comparable to those shown by human listeners. Like English listeners, when trained on English, the models compensate more for place assimilation than for voicing assimilation, and like French listeners, the models show the opposite pattern when trained on French. Second, the models which best predict the human pattern use contextually-sensitive acoustic models and language models, which capture allophony and phonotactics, but do not make use of higher-level knowledge of a lexicon or word boundaries. Finally, some models overcompensate for assimilation, showing a (super-human) ability to recover the underlying form even in the absence of the triggering phonological context, pointing to an incomplete neutralization not exploited by human listeners

    Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception

    Get PDF
    Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks

    The time course of auditory and language-specific mechanisms in compensation for sibilant assimilation

    Get PDF
    Models of spoken-word recognition differ on whether compensation for assimilation is language-specific or depends on general auditory processing. English and French participants were taught words that began or ended with the sibilants /s/ and /∫/. Both languages exhibit some assimilation in sibilant sequences (e.g., /s/ becomes like [∫] in dress shop and classe chargée), but they differ in the strength and predominance of anticipatory versus carryover assimilation. After training, participants were presented with novel words embedded in sentences, some of which contained an assimilatory context either preceding or following. A continuum of target sounds ranging from [s] to [∫] was spliced into the novel words, representing a range of possible assimilation strengths. Listeners' perceptions were examined using a visual-world eyetracking paradigm in which the listener clicked on pictures matching the novel words. We found two distinct language-general context effects: a contrastive effect when the assimilating context preceded the target, and flattening of the sibilant categorization function (increased ambiguity) when the assimilating context followed. Furthermore, we found that English but not French listeners were able to resolve the ambiguity created by the following assimilatory context, consistent with their greater experience with assimilation in this context. The combination of these mechanisms allows listeners to deal flexibly with variability in speech forms

    Interaction in spoken word recognition models:Feedback helps

    Get PDF
    Human perception, cognition, and action requires fast integration of bottom-up signals with top-down knowledge and context. A key theoretical perspective in cognitive science is the interactive activation hypothesis: forward and backward flow in bidirectionally connected neural networks allows humans and other biological systems to approximate optimal integration of bottom-up and top-down information under real-world constraints. An alternative view is that online feedback is neither necessary nor helpful; purely feed forward alternatives can be constructed for any feedback system, and online feedback could not improve processing and would preclude veridical perception. In the domain of spoken word recognition, the latter view was apparently supported by simulations using the interactive activation model, TRACE, with and without feedback: as many words were recognized more quickly without feedback as were recognized faster with feedback, However, these simulations used only a small set of words and did not address a primary motivation for interaction: making a model robust in noise. We conducted simulations using hundreds of words, and found that the majority were recognized more quickly with feedback than without. More importantly, as we added noise to inputs, accuracy and recognition times were better with feedback than without. We follow these simulations with a critical review of recent arguments that online feedback in interactive activation models like TRACE is distinct from other potentially helpful forms of feedback. We conclude that in addition to providing the benefits demonstrated in our simulations, online feedback provides a plausible means of implementing putatively distinct forms of feedback, supporting the interactive activation hypothesis

    Individual differences in speech perception cue weights

    No full text
    The files here describe the stimuli used in this project. The stimulus files are also available. If you would like to use these files in your own project or teaching please attribute them to Meghan Clayards and include a link to this site
    • 

    corecore