121 research outputs found

    Speaker-normalized sound representations in the human auditory cortex

    Get PDF
    The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers

    The cortical processing of speech sounds in the temporal lobe

    Get PDF

    The uptake of spectral and temporal cues in vowel perception is rapidly influenced by context

    No full text
    Speech perception is dependent on auditory information within phonemes such as spectral or temporal cues. The perception of those cues, however, is affected by auditory information in surrounding context (e.g., a fast context sentence can make a target vowel sound subjectively longer). In a two-by-two design the current experiments investigated when these different factors influence vowel perception. Dutch listeners categorized minimal word pairs such as /tɑk/–/taːk/ (“branch”–“task”) embedded in a context sentence. Critically, the Dutch /ɑ/–/aː/ contrast is cued by spectral and temporal information. We varied the second formant (F2) frequencies and durations of the target vowels. Independently, we also varied the F2 and duration of all segments in the context sentence. The timecourse of cue uptake on the targets was measured in a printed-word eye-tracking paradigm. Results show that the uptake of spectral cues slightly precedes the uptake of temporal cues. Furthermore, acoustic manipulations of the context sentences influenced the uptake of cues in the target vowel immediately. That is, listeners did not need additional time to integrate spectral or temporal cues of a target sound with auditory information in the context. These findings argue for an early locus of contextual influences in speech perception

    Perceptual restoration of masked speech in human cortex

    No full text
    Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction

    Constraints on the processes responsible for the extrinsic normalization of vowels

    Get PDF
    Listeners tune in to talkers’ vowels through extrinsic normalization. We asked here whether this process could be based on compensation for the long-term average spectrum (LTAS) of preceding sounds and whether the mechanisms responsible for normalization are indifferent to the nature of those sounds. If so, normalization should apply to nonspeech stimuli. Previous findings were replicated with first-formant (F1) manipulations of speech. Targets on a [pt]–[pɛt] (low–high F1) continuum were labeled as [pt] more after high-F1 than after low-F1 precursors. Spectrally rotated nonspeech versions of these materials produced similar normalization. None occurred, however, with nonspeech stimuli that were less speechlike, even though precursor–target LTAS relations were equivalent to those used earlier. Additional experiments investigated the roles of pitch movement, amplitude variation, formant location, and the stimuli's perceived similarity to speech. It appears that normalization is not restricted to speech but that the nature of the preceding sounds does matter. Extrinsic normalization of vowels is due, at least in part, to an auditory process that may require familiarity with the spectrotemporal characteristics of speech

    At which processing level does extrinsic speaker information influence vowel perception?

    No full text
    The interpretation of vowel sounds depends on perceived characteristics of the speaker (e.g., average first formant (F1) frequency). A vowel between /I/ and /E/ is more likely to be perceived as /I/ if a precursor sentence indicates that the speaker has a relatively high average F1. Behavioral and electrophysiological experiments investigating the locus of this extrinsic vowel normalization are reported. The normalization effect with a categorization task was first replicated. More vowels on an /I/-/E/ continuum followed by a /papu/ context were categorized as /I/ with a high-F1 context than with a low-F1 context. Two experiments then examined this context effect in a 4I-oddity discrimination task. Ambiguous vowels were more difficult to distinguish from the /I/-endpoint if the context /papu/ had a high F1 than if it had a low F1 (and vice versa for discrimination of ambiguous vowels from the /E/-endpoint). Furthermore, between-category discriminations were no easier than within-category discriminations. Together, these results suggest that the normalization mechanism operates largely at an auditory processing level. The MisMatch Negativity (an automatically evoked brain potential) arising from the same stimuli is being measured, to investigate whether extrinsic normalization takes place in the absence of an explicit decision task

    Selective attention to a specific talker does not change the effect of surrounding acoustic context

    Get PDF
    Spoken sentences contain considerable prosodic variation, for instance in their speech rate [1]. One mechanism by which the listener can overcome such variation is by interpreting the durations of speech sounds relative to the surrounding speech rate. Indeed, in a fast context, a durationally ambiguous sound is perceived as longer than in a slow context [2]. In abstractionist models of spoken word comprehension, this process – known as rate normalization – affects pre-lexical representations before abstract phonological representations are accessed [3]. A recent study [4] provided support for such an early perceptual locus of rate normalization. In that study, participants performed a visual search task that induced high (large grid) vs. low (small grid) cognitive load, while listening to fast and slow context sentences. Context sentences were followed by durationally ambiguous targets. Fast sentences were shown to bias target perception towards more ‘long’ target segments than slow contexts. Critically, changes in cognitive load did not modulate this rate effect. These findings support a model in which normalization processes arise early during perceptual processing; too early to be affected by attentional modulation. The present study further evaluated the cognitive locus of normalization processes by testing the influence of another form of attention: auditory stream segregation. Specifically, if listeners are presented with a fast and a slow talker at the same time but in different ears, does explicitly attending to one or the other stream influence target perception? The aforementioned model [4] predicts that selective attention should not influence target perception, since normalization processes should be robust against changes in attention allocation. Alternatively, if attention does modulate normalization processes, two participants, one attending to fast, the other to slow speech, should show different perception. Dutch participants (Expt 1: N=32; Expt 2: N=16; Expt 3: N=16) were presented with 200 fast and slow context sentences of various lengths, followed by a target duration continuum ambiguous between, e.g., short target “geven” /ˈxevə/ give vs. long target “gegeven” /xəˈxevə/ given (i.e., 20 target pairs differing presence/absence of unstressed syllable /xə-/). Critically, in Experiment 1, participants heard two talkers simultaneously (talker and location counter-balanced across participants), one (relatively long) sentence at a fast rate, and one (half as long) sentence at a slow rate (rate varied within participants). Context sentences were followed by ambiguous targets from yet another talker (Fig. 1). Half of the participants was instructed to attend to talker A, while the other half attended to talker B. Thus, participants heard identical auditory stimuli, but varied in which talker they attended to. Debriefing questionnaires and transcriptions of attended talkers in filler trials confirmed that participants successfully attended to one talker, and ignored the other. Nevertheless, no effect of attended rate was found (Fig. 2; p>.9), indicating that modulation of attention did not influence participants’ rate normalization. Control experiments showed that it was possible to obtain rate effects with single talker contexts that were either talker-incongruent (Expt 2) or talker-congruent (Expt 3) with the following target (Fig. 1). In both of these experiments, there was a higher proportion of long target responses following a fast context (Fig. 2). This shows that contextual rate affected the perception of syllabic duration and that talker-congruency with the target did not change the effect. Therefore, in line with [4], the current experiments suggest that normalization processes arise early in perception, and are robust against changes in attention
    • 

    corecore