Search CORE

8,655 research outputs found

Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Learning An Invariant Speech Representation

Author: Evangelopoulos Georgios
Poggio Tomaso
Rosasco Lorenzo
Voinea Stephen
Zhang Chiyuan
Publication venue
Publication date: 01/01/2014
Field of study

Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be obtained by projecting it to each template orbit, i.e., the set of transformed signals, and computing the associated one-dimensional empirical probability distributions. The computations can be performed by modules of filtering and pooling, and extended to hierarchical architectures. In this paper, we apply a single-layer, multicomponent representation for phonemes and demonstrate improved accuracy and decreased sample complexity for vowel classification compared to standard spectral, cepstral and perceptual features.Comment: CBMM Memo No. 022, 5 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Resonant Neural Dynamics of Speech Perception

Author: Grossberg Stephen
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/09/2002
Field of study

What is the neural representation of a speech code as it evolves in time? How do listeners integrate temporally distributed phonemic information across hundreds of milliseconds, even backwards in time, into coherent representations of syllables and words? What sorts of brain mechanisms encode the correct temporal order, despite such backwards effects, during speech perception? How does the brain extract rate-invariant properties of variable-rate speech? This article describes an emerging neural model that suggests answers to these questions, while quantitatively simulating challenging data about audition, speech and word recognition. This model includes bottom-up filtering, horizontal competitive, and top-down attentional interactions between a working memory for short-term storage of phonetic items and a list categorization network for grouping sequences of items. The conscious speech and word recognition code is suggested to be a resonant wave of activation across such a network, and a percept of silence is proposed to be a temporal discontinuity in the rate with which such a resonant wave evolves. Properties of these resonant waves can be traced to the brain mechanisms whereby auditory, speech, and language representations are learned in a stable way through time. Because resonances are proposed to control stable learning, the model is called an Adaptive Resonance Theory, or ART, model.Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (IRI-97-20333); Office of Naval Research (N00014-01-1-0624)

Boston University Institutional Repository (OpenBU)

Reduced structural connectivity between left auditory thalamus and the motion-sensitive planum temporale in developmental dyslexia

Author: Blank Helen
Diaz Begona
Ruisinger Anja
Tschentscher Nadja
von Kriegstein Katharina
Publication venue
Publication date: 28/11/2018
Field of study

Developmental dyslexia is characterized by the inability to acquire typical reading and writing skills. Dyslexia has been frequently linked to cerebral cortex alterations; however recent evidence also points towards sensory thalamus dysfunctions: dyslexics showed reduced responses in the left auditory thalamus (medial geniculate body, MGB) during speech processing in contrast to neurotypical readers. In addition, in the visual modality, dyslexics have reduced structural connectivity between the left visual thalamus (lateral geniculate nucleus, LGN) and V5/MT, a cerebral cortex region involved in visual movement processing. Higher LGN-V5/MT connectivity in dyslexics was associated with the faster rapid naming of letters and numbers (RANln), a measure that is highly correlated with reading proficiency. We here tested two hypotheses that were directly derived from these previous findings. First, we tested the hypothesis that dyslexics have reduced structural connectivity between the left MGB and the auditory motion-sensitive part of the left planum temporale (mPT). Second, we hypothesized that the amount of left mPT-MGB connectivity correlates with dyslexics RANln scores. Using diffusion tensor imaging based probabilistic tracking we show that male adults with developmental dyslexia have reduced structural connectivity between the left MGB and the left mPT, confirming the first hypothesis. Stronger left mPT-MGB connectivity was not associated with faster RANnl scores in dyslexics, but in neurotypical readers. Our findings provide first evidence that reduced cortico-thalamic connectivity in the auditory modality is a feature of developmental dyslexia, and that it may also impact on reading related cognitive abilities in neurotypical readers

arXiv.org e-Print Archive

MPG.PuRe

Neural Dynamics of Autistic Behaviors: Cognitive, Emotional, and Timing Substrates

Author: Grossberg Stephen
Seidman Don
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/03/2003
Field of study

What brain mechanisms underlie autism and how do they give rise to autistic behavioral symptoms? This article describes a neural model, called the iSTART model, which proposes how cognitive, emotional, timing, and motor processes may interact together to create and perpetuate autistic symptoms. These model processes were originally developed to explain data concerning how the brain controls normal behaviors. The iSTART model shows how autistic behavioral symptoms may arise from prescribed breakdowns in these brain processes.Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624

Boston University Institutional Repository (OpenBU)

A Upf3b-mutant mouse model with behavioral and neurogenesis defects.

Author: Cook-Andersen H
Dumdie J
Espinoza JL
Gecz J
Huang L
Jolly LA
Jones SH
Kim H
Lou C-H
Phan MH
Roberts AJ
Shum EY
Skarbrevik DM
Swerdlow NR
Wilkinson MF
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Nonsense-mediated RNA decay (NMD) is a highly conserved and selective RNA degradation pathway that acts on RNAs terminating their reading frames in specific contexts. NMD is regulated in a tissue-specific and developmentally controlled manner, raising the possibility that it influences developmental events. Indeed, loss or depletion of NMD factors have been shown to disrupt developmental events in organisms spanning the phylogenetic scale. In humans, mutations in the NMD factor gene, UPF3B, cause intellectual disability (ID) and are strongly associated with autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD) and schizophrenia (SCZ). Here, we report the generation and characterization of mice harboring a null Upf3b allele. These Upf3b-null mice exhibit deficits in fear-conditioned learning, but not spatial learning. Upf3b-null mice also have a profound defect in prepulse inhibition (PPI), a measure of sensorimotor gating commonly deficient in individuals with SCZ and other brain disorders. Consistent with both their PPI and learning defects, cortical pyramidal neurons from Upf3b-null mice display deficient dendritic spine maturation in vivo. In addition, neural stem cells from Upf3b-null mice have impaired ability to undergo differentiation and require prolonged culture to give rise to functional neurons with electrical activity. RNA sequencing (RNAseq) analysis of the frontal cortex identified UPF3B-regulated RNAs, including direct NMD target transcripts encoding proteins with known functions in neural differentiation, maturation and disease. We suggest Upf3b-null mice serve as a novel model system to decipher cellular and molecular defects underlying ID and neurodevelopmental disorders

Adelaide Research & Scholarship

eScholarship - University of California

Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition

Author: Blank Helen
Diaz Begona
von Kriegstein Katharina
Publication venue: 'Elsevier BV'
Publication date: 24/05/2018
Field of study

The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition

arXiv.org e-Print Archive

MPG.PuRe

Direct structural connections between voice- and face-recognition areas

Author: Anwander A.
Blank H.
von Kriegstein K.
Publication venue: 'Society for Neuroscience'
Publication date: 07/09/2011
Field of study

MPG.PuRe

Location-independent and location-linked representations of sound objects.

Author: Bourquin N.M.
Clarke S.
Murray M.M.
Publication venue
Publication date: 01/01/2013
Field of study

For the recognition of sounds to benefit perception and action, their neural representations should also encode their current spatial position and their changes in position over time. The dual-stream model of auditory processing postulates separate (albeit interacting) processing streams for sound meaning and for sound location. Using a repetition priming paradigm in conjunction with distributed source modeling of auditory evoked potentials, we determined how individual sound objects are represented within these streams. Changes in perceived location were induced by interaural intensity differences, and sound location was either held constant or shifted across initial and repeated presentations (from one hemispace to the other in the main experiment or between locations within the right hemispace in a follow-up experiment). Location-linked representations were characterized by differences in priming effects between pairs presented to the same vs. different simulated lateralizations. These effects were significant at 20-39 ms post-stimulus onset within a cluster on the posterior part of the left superior and middle temporal gyri; and at 143-162 ms within a cluster on the left inferior and middle frontal gyri. Location-independent representations were characterized by a difference between initial and repeated presentations, independently of whether or not their simulated lateralization was held constant across repetitions. This effect was significant at 42-63 ms within three clusters on the right temporo-frontal region; and at 165-215 ms in a large cluster on the left temporo-parietal convexity. Our results reveal two varieties of representations of sound objects within the ventral/What stream: one location-independent, as initially postulated in the dual-stream model, and the other location-linked

ZENODO

Serveur académique lausannois

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY