229 research outputs found

    What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations

    Get PDF
    This is the author's accepted manuscript. This article may not exactly replicate the final version published in the APA journal. It is not the copy of record. The original publication is available at http://psycnet.apa.org/index.cfm?fa=search.displayrecord&uid=2011-05323-001.Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2,880 fricative productions (Jongman, Wayland, & Wong, 2000) spanning many talker and vowel contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values and manipulated the information in the training set to contrast (a) models based on a small number of invariant cues, (b) models using all cues without compensation, and (c) models in which cues underwent compensation for contextual factors. Compensation was modeled by computing cues relative to expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Speech Communication

    Get PDF
    Contains reports on eight research projects.C.J. LeBel FellowshipSystems Development FoundationNational Institutes of Health (Grant 5 T32 NS 07040-08)National Institutes of Health (Grant 5 R01 NS 04332-20)National Science Foundation (Grant 1ST 80-1759)National Science Foundation (Grant 1ST 80-17599 and MCS-8112899)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727

    Gradient Activation of Speech Categories Facilitates Listeners’ Recovery From Lexical Garden Paths, But Not Perception of Speech-in-Noise

    Get PDF
    Published 2021 AprListeners activate speech-sound categories in a gradient way, and this information is maintained and affects activation of items at higher levels of processing (McMurray et al., 2002; Toscano et al., 2010). Recent findings by Kapnoula et al. (2017) suggest that the degree to which listeners maintain within-category information varies across individuals. Here we assessed the consequences of this gradiency for speech perception. To test this, we collected a measure of gradiency for different listeners using the visual analogue scaling (VAS) task used by Kapnoula et al. (2017). We also collected 2 independent measures of performance in speech perception: a visual world paradigm (VWP) task measuring participants’ ability to recover from lexical garden paths (McMurray et al., 2009) and a speech-perception task measuring participants’ perception of isolated words in noise. Our results show that categorization gradiency does not predict participants’ performance in the speech-in-noise task. However, higher gradiency predicted higher likelihood of recovery from temporarily misleading information presented in the VWP task. These results suggest that gradient activation of speech sound categories is helpful when listeners need to reconsider their initial interpretation of the input, making them more efficient in recovering from errors.This project was supported by National Institutes of Health Grant DC008089 awarded to Bob McMurray. This work was partially supported by the Basque Government through the BERC 2018-2021 Program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation SEV-2015-0490. This project was partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) through the convocatoria 2016 Subprograma Estatal Ayudas para contratos para la Formación Posdoctoral 2016, Programa Estatal de Promoción del Talento y su Empleabilidad del Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016, reference FJCI-2016-28019 awarded to Efthymia C. Kapnoula. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant 793919, awarded to Efthymia C. Kapnoula

    Automatic formant labeling in continuous speech

    Get PDF
    This thesis was developed out of a need to reduce the time required to correct Linear Predictive Code (LPC) data used for training a formant tracker. A program was written to select peaks from LPC data and interpret them as Fl, F2, and F3, using knowledge about the phonetic transcription, the sex of the speaker, knowledge about individual phonemes, and a few heuristics. The system was tested on a database of eight speakers, four male and four female, each of whom produced ten sentences. This data set comprised 1,011 resonant phonemes covering 17,363 5-msec. frames. Overall the system correctly matched Fl in 98.9% of the frames, F2 in 92.2% of the frames, and F3 in 88.8% of the frames

    Compensation for complete assimilation in speech perception: The case of Korean labial-to-velar assimilation

    No full text
    In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., 'garden bench'→ "gardem bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a pronunciation of "garden" that carries cues for both a labial [m] and an alveolar [n]). In the current paper, we show that a similar context effect is observed for an assimilation that is often complete, Korean labial-to-velar place assimilation. In contrast to the context effects for partial assimilations, however, the context effects seem to rely completely on listeners' experience with the assimilation pattern in their native language

    The time course of auditory and language-specific mechanisms in compensation for sibilant assimilation

    Get PDF
    Models of spoken-word recognition differ on whether compensation for assimilation is language-specific or depends on general auditory processing. English and French participants were taught words that began or ended with the sibilants /s/ and /∫/. Both languages exhibit some assimilation in sibilant sequences (e.g., /s/ becomes like [∫] in dress shop and classe chargée), but they differ in the strength and predominance of anticipatory versus carryover assimilation. After training, participants were presented with novel words embedded in sentences, some of which contained an assimilatory context either preceding or following. A continuum of target sounds ranging from [s] to [∫] was spliced into the novel words, representing a range of possible assimilation strengths. Listeners' perceptions were examined using a visual-world eyetracking paradigm in which the listener clicked on pictures matching the novel words. We found two distinct language-general context effects: a contrastive effect when the assimilating context preceded the target, and flattening of the sibilant categorization function (increased ambiguity) when the assimilating context followed. Furthermore, we found that English but not French listeners were able to resolve the ambiguity created by the following assimilatory context, consistent with their greater experience with assimilation in this context. The combination of these mechanisms allows listeners to deal flexibly with variability in speech forms

    Incipient tonogenesis in Phnom Penh Khmer:Computational studies

    Get PDF
    • …
    corecore