8,655 research outputs found

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Learning An Invariant Speech Representation

    Get PDF
    Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be obtained by projecting it to each template orbit, i.e., the set of transformed signals, and computing the associated one-dimensional empirical probability distributions. The computations can be performed by modules of filtering and pooling, and extended to hierarchical architectures. In this paper, we apply a single-layer, multicomponent representation for phonemes and demonstrate improved accuracy and decreased sample complexity for vowel classification compared to standard spectral, cepstral and perceptual features.Comment: CBMM Memo No. 022, 5 pages, 2 figure

    Resonant Neural Dynamics of Speech Perception

    Full text link
    What is the neural representation of a speech code as it evolves in time? How do listeners integrate temporally distributed phonemic information across hundreds of milliseconds, even backwards in time, into coherent representations of syllables and words? What sorts of brain mechanisms encode the correct temporal order, despite such backwards effects, during speech perception? How does the brain extract rate-invariant properties of variable-rate speech? This article describes an emerging neural model that suggests answers to these questions, while quantitatively simulating challenging data about audition, speech and word recognition. This model includes bottom-up filtering, horizontal competitive, and top-down attentional interactions between a working memory for short-term storage of phonetic items and a list categorization network for grouping sequences of items. The conscious speech and word recognition code is suggested to be a resonant wave of activation across such a network, and a percept of silence is proposed to be a temporal discontinuity in the rate with which such a resonant wave evolves. Properties of these resonant waves can be traced to the brain mechanisms whereby auditory, speech, and language representations are learned in a stable way through time. Because resonances are proposed to control stable learning, the model is called an Adaptive Resonance Theory, or ART, model.Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (IRI-97-20333); Office of Naval Research (N00014-01-1-0624)

    Reduced structural connectivity between left auditory thalamus and the motion-sensitive planum temporale in developmental dyslexia

    Full text link
    Developmental dyslexia is characterized by the inability to acquire typical reading and writing skills. Dyslexia has been frequently linked to cerebral cortex alterations; however recent evidence also points towards sensory thalamus dysfunctions: dyslexics showed reduced responses in the left auditory thalamus (medial geniculate body, MGB) during speech processing in contrast to neurotypical readers. In addition, in the visual modality, dyslexics have reduced structural connectivity between the left visual thalamus (lateral geniculate nucleus, LGN) and V5/MT, a cerebral cortex region involved in visual movement processing. Higher LGN-V5/MT connectivity in dyslexics was associated with the faster rapid naming of letters and numbers (RANln), a measure that is highly correlated with reading proficiency. We here tested two hypotheses that were directly derived from these previous findings. First, we tested the hypothesis that dyslexics have reduced structural connectivity between the left MGB and the auditory motion-sensitive part of the left planum temporale (mPT). Second, we hypothesized that the amount of left mPT-MGB connectivity correlates with dyslexics RANln scores. Using diffusion tensor imaging based probabilistic tracking we show that male adults with developmental dyslexia have reduced structural connectivity between the left MGB and the left mPT, confirming the first hypothesis. Stronger left mPT-MGB connectivity was not associated with faster RANnl scores in dyslexics, but in neurotypical readers. Our findings provide first evidence that reduced cortico-thalamic connectivity in the auditory modality is a feature of developmental dyslexia, and that it may also impact on reading related cognitive abilities in neurotypical readers

    Neural Dynamics of Autistic Behaviors: Cognitive, Emotional, and Timing Substrates

    Full text link
    What brain mechanisms underlie autism and how do they give rise to autistic behavioral symptoms? This article describes a neural model, called the iSTART model, which proposes how cognitive, emotional, timing, and motor processes may interact together to create and perpetuate autistic symptoms. These model processes were originally developed to explain data concerning how the brain controls normal behaviors. The iSTART model shows how autistic behavioral symptoms may arise from prescribed breakdowns in these brain processes.Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624

    A Upf3b-mutant mouse model with behavioral and neurogenesis defects.

    Get PDF
    Nonsense-mediated RNA decay (NMD) is a highly conserved and selective RNA degradation pathway that acts on RNAs terminating their reading frames in specific contexts. NMD is regulated in a tissue-specific and developmentally controlled manner, raising the possibility that it influences developmental events. Indeed, loss or depletion of NMD factors have been shown to disrupt developmental events in organisms spanning the phylogenetic scale. In humans, mutations in the NMD factor gene, UPF3B, cause intellectual disability (ID) and are strongly associated with autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD) and schizophrenia (SCZ). Here, we report the generation and characterization of mice harboring a null Upf3b allele. These Upf3b-null mice exhibit deficits in fear-conditioned learning, but not spatial learning. Upf3b-null mice also have a profound defect in prepulse inhibition (PPI), a measure of sensorimotor gating commonly deficient in individuals with SCZ and other brain disorders. Consistent with both their PPI and learning defects, cortical pyramidal neurons from Upf3b-null mice display deficient dendritic spine maturation in vivo. In addition, neural stem cells from Upf3b-null mice have impaired ability to undergo differentiation and require prolonged culture to give rise to functional neurons with electrical activity. RNA sequencing (RNAseq) analysis of the frontal cortex identified UPF3B-regulated RNAs, including direct NMD target transcripts encoding proteins with known functions in neural differentiation, maturation and disease. We suggest Upf3b-null mice serve as a novel model system to decipher cellular and molecular defects underlying ID and neurodevelopmental disorders

    Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition

    Full text link
    The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition

    Direct structural connections between voice- and face-recognition areas

    No full text

    Location-independent and location-linked representations of sound objects.

    Get PDF
    For the recognition of sounds to benefit perception and action, their neural representations should also encode their current spatial position and their changes in position over time. The dual-stream model of auditory processing postulates separate (albeit interacting) processing streams for sound meaning and for sound location. Using a repetition priming paradigm in conjunction with distributed source modeling of auditory evoked potentials, we determined how individual sound objects are represented within these streams. Changes in perceived location were induced by interaural intensity differences, and sound location was either held constant or shifted across initial and repeated presentations (from one hemispace to the other in the main experiment or between locations within the right hemispace in a follow-up experiment). Location-linked representations were characterized by differences in priming effects between pairs presented to the same vs. different simulated lateralizations. These effects were significant at 20-39 ms post-stimulus onset within a cluster on the posterior part of the left superior and middle temporal gyri; and at 143-162 ms within a cluster on the left inferior and middle frontal gyri. Location-independent representations were characterized by a difference between initial and repeated presentations, independently of whether or not their simulated lateralization was held constant across repetitions. This effect was significant at 42-63 ms within three clusters on the right temporo-frontal region; and at 165-215 ms in a large cluster on the left temporo-parietal convexity. Our results reveal two varieties of representations of sound objects within the ventral/What stream: one location-independent, as initially postulated in the dual-stream model, and the other location-linked
    corecore