148 research outputs found

    Speaker-normalized sound representations in the human auditory cortex

    Get PDF
    The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers

    The cortical processing of speech sounds in the temporal lobe

    Get PDF

    Speaking rate and spectral context affect the Dutch /a/ - /aa/ contrast

    No full text
    Dutch minimal word pairs such as 'gaas'-'gas' ("gauze"-"gas") differ in durational and spectral aspects of their vowels. These cues, however, are interpreted relative to the context in which they are heard. In a fast context, an "a" sounds relatively longer and is more likely to be interpreted as "aa". Similarly, when low frequencies in a context are perceived as dominant, high frequencies in the "a" become more salient, again more often leading to perception of "aa". A categorization experiment in which durational and spectral cues to the vowels were varied confirmed that Dutch listeners use both dimensions to distinguish between "a" and "aa". In Experiment 2, words were presented in rate- and spectrally manipulated sentences. Listeners, as predicted, interpreted the vowels relative to the context. An eye-tracking experiment will investigate the time course of these context effects and thus inform theories of the role of context in speech recognition

    The uptake of spectral and temporal cues in vowel perception is rapidly influenced by context

    No full text
    Speech perception is dependent on auditory information within phonemes such as spectral or temporal cues. The perception of those cues, however, is affected by auditory information in surrounding context (e.g., a fast context sentence can make a target vowel sound subjectively longer). In a two-by-two design the current experiments investigated when these different factors influence vowel perception. Dutch listeners categorized minimal word pairs such as /tɑk/–/taːk/ (“branch”–“task”) embedded in a context sentence. Critically, the Dutch /ɑ/–/aː/ contrast is cued by spectral and temporal information. We varied the second formant (F2) frequencies and durations of the target vowels. Independently, we also varied the F2 and duration of all segments in the context sentence. The timecourse of cue uptake on the targets was measured in a printed-word eye-tracking paradigm. Results show that the uptake of spectral cues slightly precedes the uptake of temporal cues. Furthermore, acoustic manipulations of the context sentences influenced the uptake of cues in the target vowel immediately. That is, listeners did not need additional time to integrate spectral or temporal cues of a target sound with auditory information in the context. These findings argue for an early locus of contextual influences in speech perception

    Transformation of a temporal speech cue to a spatial neural code in human auditory cortex

    Get PDF
    In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population’s preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues

    Perceptual restoration of masked speech in human cortex

    No full text
    Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction

    Constraints on the processes responsible for the extrinsic normalization of vowels

    Get PDF
    Listeners tune in to talkers’ vowels through extrinsic normalization. We asked here whether this process could be based on compensation for the long-term average spectrum (LTAS) of preceding sounds and whether the mechanisms responsible for normalization are indifferent to the nature of those sounds. If so, normalization should apply to nonspeech stimuli. Previous findings were replicated with first-formant (F1) manipulations of speech. Targets on a [pt]–[pɛt] (low–high F1) continuum were labeled as [pt] more after high-F1 than after low-F1 precursors. Spectrally rotated nonspeech versions of these materials produced similar normalization. None occurred, however, with nonspeech stimuli that were less speechlike, even though precursor–target LTAS relations were equivalent to those used earlier. Additional experiments investigated the roles of pitch movement, amplitude variation, formant location, and the stimuli's perceived similarity to speech. It appears that normalization is not restricted to speech but that the nature of the preceding sounds does matter. Extrinsic normalization of vowels is due, at least in part, to an auditory process that may require familiarity with the spectrotemporal characteristics of speech

    At which processing level does extrinsic speaker information influence vowel perception?

    No full text
    The interpretation of vowel sounds depends on perceived characteristics of the speaker (e.g., average first formant (F1) frequency). A vowel between /I/ and /E/ is more likely to be perceived as /I/ if a precursor sentence indicates that the speaker has a relatively high average F1. Behavioral and electrophysiological experiments investigating the locus of this extrinsic vowel normalization are reported. The normalization effect with a categorization task was first replicated. More vowels on an /I/-/E/ continuum followed by a /papu/ context were categorized as /I/ with a high-F1 context than with a low-F1 context. Two experiments then examined this context effect in a 4I-oddity discrimination task. Ambiguous vowels were more difficult to distinguish from the /I/-endpoint if the context /papu/ had a high F1 than if it had a low F1 (and vice versa for discrimination of ambiguous vowels from the /E/-endpoint). Furthermore, between-category discriminations were no easier than within-category discriminations. Together, these results suggest that the normalization mechanism operates largely at an auditory processing level. The MisMatch Negativity (an automatically evoked brain potential) arising from the same stimuli is being measured, to investigate whether extrinsic normalization takes place in the absence of an explicit decision task
    • 

    corecore