403 research outputs found
Speaker-normalized sound representations in the human auditory cortex
The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers
Organization of Human Auditory Cortex: Responses to Frequency Modulated Sounds
Functional Magnetic Resonance Imaging (fMRI) was used to investigate the extent, magnitude and patterns of brain activity in response to frequency-modulated sounds. We examined this by manipulating the direction (rise vs. fall) and the rate (fast vs. slow) of a series of iterated rippled noise (IRN) bursts. Participants were presented with auditory stimuli while functional images of the cortex were obtained. Univariate analyses revealed more widespread activation within auditory cortex in response to frequency-modulated sweeps compared to steady-state sounds. Furthermore, multivoxel pattern analysis (MVPA) was used to determine whether regions within auditory cortex were involved in feature-specific encoding. The pattern of activity within auditory cortex showed a high degree of consistency for the rate dimension, suggesting this pattern of activity infers representational information. Additionally, activity patterns for direction were not distinguishable, which suggests this coding occurs over a neural activity pattern not distinguishable at the level of the BOLD response
Auditory Selective Attention to Speech Modulates Activity in the Visual Word Form Area
Selective attention to speech versus nonspeech signals in complex auditory input could produce top-down modulation of cortical regions previously linked to perception of spoken, and even visual, words. To isolate such top-down attentional effects, we contrasted 2 equally challenging active listening tasks, performed on the same complex auditory stimuli (words overlaid with a series of 3 tones). Instructions required selectively attending to either the speech signals (in service of rhyme judgment) or the melodic signals (tone-triplet matching). Selective attention to speech, relative to attention to melody, was associated with blood oxygenation level-dependent (BOLD) increases during functional magnetic resonance imaging (fMRI) in left inferior frontal gyrus, temporal regions, and the visual word form area (VWFA). Further investigation of the activity in visual regions revealed overall deactivation relative to baseline rest for both attention conditions. Topographic analysis demonstrated that while attending to melody drove deactivation equivalently across all fusiform regions of interest examined, attending to speech produced a regionally specific modulation: deactivation of all fusiform regions, except the VWFA. Results indicate that selective attention to speech can topographically tune extrastriate cortex, leading to increased activity in VWFA relative to surrounding regions, in line with the well-established connectivity between areas related to spoken and visual word perception in skilled reader
Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention
Auditory Selective Attention to Speech Modulates Activity in the Visual Word Form Area
Selective attention to speech versus nonspeech signals in complex auditory input could produce top-down modulation of cortical regions previously linked to perception of spoken, and even visual, words. To isolate such top-down attentional effects, we contrasted 2 equally challenging active listening tasks, performed on the same complex auditory stimuli (words overlaid with a series of 3 tones). Instructions required selectively attending to either the speech signals (in service of rhyme judgment) or the melodic signals (tone-triplet matching). Selective attention to speech, relative to attention to melody, was associated with blood oxygenation level–dependent (BOLD) increases during functional magnetic resonance imaging (fMRI) in left inferior frontal gyrus, temporal regions, and the visual word form area (VWFA). Further investigation of the activity in visual regions revealed overall deactivation relative to baseline rest for both attention conditions. Topographic analysis demonstrated that while attending to melody drove deactivation equivalently across all fusiform regions of interest examined, attending to speech produced a regionally specific modulation: deactivation of all fusiform regions, except the VWFA. Results indicate that selective attention to speech can topographically tune extrastriate cortex, leading to increased activity in VWFA relative to surrounding regions, in line with the well-established connectivity between areas related to spoken and visual word perception in skilled readers
Recommended from our members
Neural mechanisms of attention and speech perception in complex, spatial acoustic environment
We can hold conversations with people in environments where typically there are additional simultaneous talkers in background acoustic space or noise like vehicles on the street or music playing at a café on the sidewalk. This seemingly trivial everyday task is difficult for people with hearing deficits and is extremely hard to model in machines. This dissertation focuses on exploring the neural mechanisms of how the human brain encodes such complex acoustic environments and how cognitive processes like attention shapes processing of the attended speech. My initial experiments explore the representation of acoustic features that help us localize single sound sources in the environment- features like direction and spectrotemporal content of the sounds, and the interaction of these representations with each other. I play natural American English sentences coming from five azimuthal directions in space.
Using intracranial electrocorticography (ECoG) recordings from the human auditory cortex of the listener, I show that the direction of sound and the spectrotemporal content are encoded in two distinct aspects of neural response, the direction modulates the mean of the response and the spectrotemporal features contributes to the modulation of neural response around its mean. Furthermore, I show that these features are orthogonal to each other and do not interact. This representation enables successful decoding of both spatial and phonetic information. These findings contribute to defining the functional organization of responses in the human auditory cortex, with implications for more accurate neurophysiological models of spatial speech processing.
I take a step further to investigate the role of attention in encoding the direction and phonetic features of speech. I play a mixture of male and female spatialized talkers eg. male at left side to the listener and female at right side (talker’s locations switch randomly after each sentence). I ask the listener to follow a given talker e.g. follow male talker as they switch their location after each uttered sentence. While the listener performs this experiment, I collect intracranial EEG data from their auditory cortex. I investigate the bottom-up stimulus dependent and attention independent encoding of such a cocktail party speech and the top-down attention driven role in the encoding of location and speech features. I find a bottom-up stimulus driven contralateral preference in encoding of the mixed speech i.e. Left brain hemisphere automatically and predominantly encodes speech coming from right direction and vice-versa. On top of this bottom-up representation, I find that attended talker’s direction modulates the baseline of the neural response and attended talker’s voice modulates the spectrotemporal tuning of the neural response. Moreover, the modulation to attended talker’s location is present throughout the auditory cortex but the modulation to attended talker’s voice is present only at higher order auditory cortex areas. My findings provide crucially needed evidence to determine how bottom-up and top-down signals interact in the auditory cortex in crowded and complex acoustic scenes to enable robust speech perception. Furthermore, they shed light on the hierarchical encoding of attended speech that have implications on bettering the auditory attention decoding models.
Finally, I talk about a clinical case study where we show that electrical stimulation to specific sites in planum temporale (PT) of an epilepsy patient implanted with intracranial electrode leads to enhancement in speech in noise perception. When noisy speech is played with such an electrical stimulation, the patient perceives that the noise disappears, and that the speech is similar to clean speech that they hear without any noise. We performed series of analysis to determine functional organization of the three main sub regions of the human auditory cortex- planum temporale (PT), Heschl’s gyrus (HG) and superior temporal gyrus (STG). Using Cortico-Cortical Evoked Potentials (CCEPs), we modeled the PT sites to be located between the sites in HG and STG. Furthermore, we find that the discriminability of speech from nonspeech sounds increased in population neural responses from HG to the PT to the STG sites. These findings causally implicate the PT in background noise suppression and may point to a novel potential neuroprosthetic solution to assist in the challenging task of speech perception in noise.
Together, this dissertation shows new evidence for the neural encoding of spatial speech; interaction of stimulus driven, and attention driven neural processes in spatial multi-talker speech perception and enhancement of speech in noise perception by electrical brain stimulation
Activation of the left planum temporale in pitch processing is shaped by language experience
Implicit, abstract knowledge acquired through language experience can alter cortical processing of complex auditory signals. To isolate prelexical processing of linguistic tones (i.e., pitch variations that convey part of word meaning), a novel design was used in which hybrid stimuli were created by superimposing Thai tones onto Chinese syllables (tonal chimeras) and Chinese tones onto the same syllables (Chinese words). Native speakers of tone languages (Chinese, Thai) underwent fMRI scans as they judged tones from both stimulus sets. In a comparison of native vs. non‐native tones, overlapping activity was identified in the left planum temporale (PT). In this area a double dissociation between language experience and neural representation of pitch occurred such that stronger activity was elicited in response to native as compared to non‐native tones. This finding suggests that cortical processing of pitch information can be shaped by language experience and, moreover, that lateralized PT activation can be driven by top‐down cognitive processing
Individual differences in the discrimination of novel speech sounds: effects of sex, temporal processing, musical and cognitive abilities
This study examined whether rapid temporal auditory processing, verbal working memory capacity, non-verbal intelligence, executive functioning, musical ability and prior foreign language experience predicted how well native English speakers (N = 120) discriminated Norwegian tonal and vowel contrasts as well as a non-speech analogue of the tonal contrast and a native vowel contrast presented over noise. Results confirmed a male advantage for temporal and tonal processing, and also revealed that temporal processing was associated with both non-verbal intelligence and speech processing. In contrast, effects of musical ability on non-native speech-sound processing and of inhibitory control on vowel discrimination were not mediated by temporal processing. These results suggest that individual differences in non-native speech-sound processing are to some extent determined by temporal auditory processing ability, in which males perform better, but are also determined by a host of other abilities that are deployed flexibly depending on the characteristics of the target sounds
Individual auditory categorization abilities are shaped by intrinsic and experience-driven neural factors
Individual auditory categorization abilities are shaped by intrinsic and experience-driven neural factor
- …