53,202 research outputs found

    Modulation-frequency acts as a primary cue for auditory stream segregation

    Get PDF
    In our surrounding acoustic world sounds are produced by different sources and interfere with each other before arriving to the ears. A key function of the auditory system is to provide consistent and robust descriptions of the coherent sound groupings and sequences (auditory objects), which likely correspond to the various sound sources in the environment. This function has been termed auditory stream segregation. In the current study we tested the effects of separation in the frequency of amplitude modulation on the segregation of concurrent sound sequences in the auditory stream-segregation paradigm (van Noorden 1975). The aim of the study was to assess 1) whether differential amplitude modulation would help in separating concurrent sound sequences and 2) whether this cue would interact with previously studied static cues (carrier frequency and location difference) in segregating concurrent streams of sound. We found that amplitude modulation difference is utilized as a primary cue for the stream segregation and it interacts with other primary cues such as frequency and location difference

    The role of perceived source location in auditory stream segregation: separation affects sound organization, common fate does not

    Get PDF
    The human auditory system is capable of grouping sounds originating from different sound sources into coherent auditory streams, a process termed auditory stream segregation. Several cues can influence auditory stream segregation, but the full set of cues and the way in which they are integrated is still unknown. In the current study, we tested whether auditory motion can serve as a cue for segregating sequences of tones. Our hypothesis was that, following the principle of common fate, sounds emitted by sources moving together in space along similar trajectories will be more likely to be grouped into a single auditory stream, while sounds emitted by independently moving sources will more often be heard as two streams. Stimuli were derived from sound recordings in which the sound source motion was induced by walking humans. Although the results showed a clear effect of spatial separation, auditory motion had a negligible influence on stream segregation. Hence, auditory motion may not be used as a primitive cue in auditory stream segregation

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

    Get PDF
    This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

    Amplitude and frequency-modulated stimuli activate common regions of human auditory cortex

    Get PDF
    Hall et al. (Hall et al., 2002, Cerebral Cortex 12:140–149) recently showed that pulsed frequency-modulated tones generate considerably higher activation than their unmodulated counterparts in non-primary auditory regions immediately posterior and lateral to Heschl’s gyrus (HG). Here, we use fMRI to explore the type of modulation necessary to evoke such differential activation. Carrier signals were a single tone and a harmonic-complex tone, with a 300 Hz fundamental, that were modulated at a rate of 5 Hz either in frequency, or in amplitude, to create six stimulus conditions (unmodulated, FM, AM). Relative to the silent baseline, the modulated tones, in particular, activated widespread regions of the auditory cortex bilaterally along the supra-temporal plane. When compared with the unmodulated tones, both AM and FM tones generated significantly greater activation in lateral HG and the planum temporale, replicating the previous findings. These activation patterns were largely overlapping, indicating a common sensitivity to both AM and FM. Direct comparisons between AM and FM revealed a higher magnitude of activation in response to the variation in amplitude than in frequency, plus a small part of the posterolateral region in the right hemisphere whose response was specifically AM-, and not FM-, dependent. The dominant pattern of activation was that of co-localized activation by AM and FM, which is consistent with a common neural code for AM and FM within these brain regions
    corecore