8,655 research outputs found
Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed
The motor theory of speech perception holds that we perceive the speech of
another in terms of a motor representation of that speech. However, when we
have learned to recognize a foreign accent, it seems plausible that recognition
of a word rarely involves reconstruction of the speech gestures of the speaker
rather than the listener. To better assess the motor theory and this
observation, we proceed in three stages. Part 1 places the motor theory of
speech perception in a larger framework based on our earlier models of the
adaptive formation of mirror neurons for grasping, and for viewing extensions
of that mirror system as part of a larger system for neuro-linguistic
processing, augmented by the present consideration of recognizing speech in a
novel accent. Part 2 then offers a novel computational model of how a listener
comes to understand the speech of someone speaking the listener's native
language with a foreign accent. The core tenet of the model is that the
listener uses hypotheses about the word the speaker is currently uttering to
update probabilities linking the sound produced by the speaker to phonemes in
the native language repertoire of the listener. This, on average, improves the
recognition of later words. This model is neutral regarding the nature of the
representations it uses (motor vs. auditory). It serve as a reference point for
the discussion in Part 3, which proposes a dual-stream neuro-linguistic
architecture to revisits claims for and against the motor theory of speech
perception and the relevance of mirror neurons, and extracts some implications
for the reframing of the motor theory
Learning An Invariant Speech Representation
Recognition of speech, and in particular the ability to generalize and learn
from small sets of labelled examples like humans do, depends on an appropriate
representation of the acoustic input. We formulate the problem of finding
robust speech features for supervised learning with small sample complexity as
a problem of learning representations of the signal that are maximally
invariant to intraclass transformations and deformations. We propose an
extension of a theory for unsupervised learning of invariant visual
representations to the auditory domain and empirically evaluate its validity
for voiced speech sound classification. Our version of the theory requires the
memory-based, unsupervised storage of acoustic templates -- such as specific
phones or words -- together with all the transformations of each that normally
occur. A quasi-invariant representation for a speech segment can be obtained by
projecting it to each template orbit, i.e., the set of transformed signals, and
computing the associated one-dimensional empirical probability distributions.
The computations can be performed by modules of filtering and pooling, and
extended to hierarchical architectures. In this paper, we apply a single-layer,
multicomponent representation for phonemes and demonstrate improved accuracy
and decreased sample complexity for vowel classification compared to standard
spectral, cepstral and perceptual features.Comment: CBMM Memo No. 022, 5 pages, 2 figure
Resonant Neural Dynamics of Speech Perception
What is the neural representation of a speech code as it evolves in time? How do listeners integrate temporally distributed phonemic information across hundreds of milliseconds, even backwards in time, into coherent representations of syllables and words? What sorts of brain mechanisms encode the correct temporal order, despite such backwards effects, during speech perception? How does the brain extract rate-invariant properties of variable-rate speech? This article describes an emerging neural model that suggests answers to these questions, while quantitatively simulating challenging data about audition, speech and word recognition. This model includes bottom-up filtering, horizontal competitive, and top-down attentional interactions between a working memory for short-term storage of phonetic items and a list categorization network for grouping sequences of items. The conscious speech and word recognition code is suggested to be a resonant wave of activation across such a network, and a percept of silence is proposed to be a temporal discontinuity in the rate with which such a resonant wave evolves. Properties of these resonant waves can be traced to the brain mechanisms whereby auditory, speech, and language representations are learned in a stable way through time. Because resonances are proposed to control stable learning, the model is called an Adaptive Resonance
Theory, or ART, model.Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (IRI-97-20333); Office of Naval Research (N00014-01-1-0624)
Reduced structural connectivity between left auditory thalamus and the motion-sensitive planum temporale in developmental dyslexia
Developmental dyslexia is characterized by the inability to acquire typical
reading and writing skills. Dyslexia has been frequently linked to cerebral
cortex alterations; however recent evidence also points towards sensory
thalamus dysfunctions: dyslexics showed reduced responses in the left auditory
thalamus (medial geniculate body, MGB) during speech processing in contrast to
neurotypical readers. In addition, in the visual modality, dyslexics have
reduced structural connectivity between the left visual thalamus (lateral
geniculate nucleus, LGN) and V5/MT, a cerebral cortex region involved in visual
movement processing. Higher LGN-V5/MT connectivity in dyslexics was associated
with the faster rapid naming of letters and numbers (RANln), a measure that is
highly correlated with reading proficiency. We here tested two hypotheses that
were directly derived from these previous findings. First, we tested the
hypothesis that dyslexics have reduced structural connectivity between the left
MGB and the auditory motion-sensitive part of the left planum temporale (mPT).
Second, we hypothesized that the amount of left mPT-MGB connectivity correlates
with dyslexics RANln scores. Using diffusion tensor imaging based probabilistic
tracking we show that male adults with developmental dyslexia have reduced
structural connectivity between the left MGB and the left mPT, confirming the
first hypothesis. Stronger left mPT-MGB connectivity was not associated with
faster RANnl scores in dyslexics, but in neurotypical readers. Our findings
provide first evidence that reduced cortico-thalamic connectivity in the
auditory modality is a feature of developmental dyslexia, and that it may also
impact on reading related cognitive abilities in neurotypical readers
Neural Dynamics of Autistic Behaviors: Cognitive, Emotional, and Timing Substrates
What brain mechanisms underlie autism and how do they give rise to autistic behavioral symptoms? This article describes a neural model, called the iSTART model, which proposes how cognitive, emotional, timing, and motor processes may interact together to create and perpetuate autistic symptoms. These model processes were originally developed to explain data concerning how the brain controls normal behaviors. The iSTART model shows how autistic behavioral symptoms may arise from prescribed breakdowns in these brain processes.Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624
A Upf3b-mutant mouse model with behavioral and neurogenesis defects.
Nonsense-mediated RNA decay (NMD) is a highly conserved and selective RNA degradation pathway that acts on RNAs terminating their reading frames in specific contexts. NMD is regulated in a tissue-specific and developmentally controlled manner, raising the possibility that it influences developmental events. Indeed, loss or depletion of NMD factors have been shown to disrupt developmental events in organisms spanning the phylogenetic scale. In humans, mutations in the NMD factor gene, UPF3B, cause intellectual disability (ID) and are strongly associated with autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD) and schizophrenia (SCZ). Here, we report the generation and characterization of mice harboring a null Upf3b allele. These Upf3b-null mice exhibit deficits in fear-conditioned learning, but not spatial learning. Upf3b-null mice also have a profound defect in prepulse inhibition (PPI), a measure of sensorimotor gating commonly deficient in individuals with SCZ and other brain disorders. Consistent with both their PPI and learning defects, cortical pyramidal neurons from Upf3b-null mice display deficient dendritic spine maturation in vivo. In addition, neural stem cells from Upf3b-null mice have impaired ability to undergo differentiation and require prolonged culture to give rise to functional neurons with electrical activity. RNA sequencing (RNAseq) analysis of the frontal cortex identified UPF3B-regulated RNAs, including direct NMD target transcripts encoding proteins with known functions in neural differentiation, maturation and disease. We suggest Upf3b-null mice serve as a novel model system to decipher cellular and molecular defects underlying ID and neurodevelopmental disorders
Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition
The cerebral cortex modulates early sensory processing via feed-back
connections to sensory pathway nuclei. The functions of this top-down
modulation for human behavior are poorly understood. Here, we show that
top-down modulation of the visual sensory thalamus (the lateral geniculate
body, LGN) is involved in visual-speech recognition. In two independent
functional magnetic resonance imaging (fMRI) studies, LGN response increased
when participants processed fast-varying features of articulatory movements
required for visual-speech recognition, as compared to temporally more stable
features required for face identification with the same stimulus material. The
LGN response during the visual-speech task correlated positively with the
visual-speech recognition scores across participants. In addition, the
task-dependent modulation was present for speech movements and did not occur
for control conditions involving non-speech biological movements. In
face-to-face communication, visual speech recognition is used to enhance or
even enable understanding what is said. Speech recognition is commonly
explained in frameworks focusing on cerebral cortex areas. Our findings suggest
that task-dependent modulation at subcortical sensory stages has an important
role for communication: Together with similar findings in the auditory modality
the findings imply that task-dependent modulation of the sensory thalami is a
general mechanism to optimize speech recognition
Location-independent and location-linked representations of sound objects.
For the recognition of sounds to benefit perception and action, their neural representations should also encode their current spatial position and their changes in position over time. The dual-stream model of auditory processing postulates separate (albeit interacting) processing streams for sound meaning and for sound location. Using a repetition priming paradigm in conjunction with distributed source modeling of auditory evoked potentials, we determined how individual sound objects are represented within these streams. Changes in perceived location were induced by interaural intensity differences, and sound location was either held constant or shifted across initial and repeated presentations (from one hemispace to the other in the main experiment or between locations within the right hemispace in a follow-up experiment). Location-linked representations were characterized by differences in priming effects between pairs presented to the same vs. different simulated lateralizations. These effects were significant at 20-39 ms post-stimulus onset within a cluster on the posterior part of the left superior and middle temporal gyri; and at 143-162 ms within a cluster on the left inferior and middle frontal gyri. Location-independent representations were characterized by a difference between initial and repeated presentations, independently of whether or not their simulated lateralization was held constant across repetitions. This effect was significant at 42-63 ms within three clusters on the right temporo-frontal region; and at 165-215 ms in a large cluster on the left temporo-parietal convexity. Our results reveal two varieties of representations of sound objects within the ventral/What stream: one location-independent, as initially postulated in the dual-stream model, and the other location-linked
- …