104,823 research outputs found

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    Analysis of Radar Doppler Signature from Human Data

    Get PDF
    This paper presents the results of time (autocorrelation) and time-frequency (spectrogram) analyses of radar signals returned from the moving human targets. When a radar signal falls on the human target which is moving toward or away from the radar, the signals reflected from different parts of his body produce a Doppler shift that is proportional to the velocity of those parts. Moving parts of the body causes the characteristic Doppler signature. The main contribution comes from the torso which causes the central Doppler frequency of target. The motion of arms and legs induces modulation on the returned radar signal and generates sidebands around the central Doppler frequency, referred to as micro-Doppler signatures. Through analyses on experimental data it was demonstrated that the human motion signature extraction is better using spectrogram. While the central Doppler frequency can be determined using the autocorrelation and the spectrogram, the extraction of the fundamental cadence frequency using the autocorrelation is unreliable when the target is in the clutter presence. It was shown that the fundamental cadence frequency increases with increasing dynamic movement of people and simultaneously the possibility of its extraction is proportional to the degree of synchronization movements of persons in the group

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    Cortical spatio-temporal dimensionality reduction for visual grouping

    Full text link
    The visual systems of many mammals, including humans, is able to integrate the geometric information of visual stimuli and to perform cognitive tasks already at the first stages of the cortical processing. This is thought to be the result of a combination of mechanisms, which include feature extraction at single cell level and geometric processing by means of cells connectivity. We present a geometric model of such connectivities in the space of detected features associated to spatio-temporal visual stimuli, and show how they can be used to obtain low-level object segmentation. The main idea is that of defining a spectral clustering procedure with anisotropic affinities over datasets consisting of embeddings of the visual stimuli into higher dimensional spaces. Neural plausibility of the proposed arguments will be discussed

    Mapping and classification of ecologically sensitive marine habitats using unmanned aerial vehicle (UAV) imagery and object-based image analysis (OBIA)

    Get PDF
    Nowadays, emerging technologies, such as long-range transmitters, increasingly miniaturized components for positioning, and enhanced imaging sensors, have led to an upsurge in the availability of new ecological applications for remote sensing based on unmanned aerial vehicles (UAVs), sometimes referred to as “drones”. In fact, structure-from-motion (SfM) photogrammetry coupled with imagery acquired by UAVs offers a rapid and inexpensive tool to produce high-resolution orthomosaics, giving ecologists a new way for responsive, timely, and cost-effective monitoring of ecological processes. Here, we adopted a lightweight quadcopter as an aerial survey tool and object-based image analysis (OBIA) workflow to demonstrate the strength of such methods in producing very high spatial resolution maps of sensitive marine habitats. Therefore, three different coastal environments were mapped using the autonomous flight capability of a lightweight UAV equipped with a fully stabilized consumer-grade RGB digital camera. In particular we investigated a Posidonia oceanica seagrass meadow, a rocky coast with nurseries for juvenile fish, and two sandy areas showing biogenic reefs of Sabelleria alveolata. We adopted, for the first time, UAV-based raster thematic maps of these key coastal habitats, produced after OBIA classification, as a new method for fine-scale, low-cost, and time saving characterization of sensitive marine environments which may lead to a more effective and efficient monitoring and management of natural resource

    Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation

    Full text link
    To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial selectivity of higher auditory neurons emerges as a direct consequence of learning efficient codes for natural binaural sounds. Firstly, it is demonstrated that a linear efficient coding transform - Independent Component Analysis (ICA) trained on spectrograms of naturalistic simulated binaural sounds extracts spatial information present in the signal. A simple hierarchical ICA extension allowing for decoding of sound position is proposed. Furthermore, it is shown that units revealing spatial selectivity can be learned from a binaural recording of a natural auditory scene. In both cases a relatively small subpopulation of learned spectrogram features suffices to perform accurate sound localization. Representation of the auditory space is therefore learned in a purely unsupervised way by maximizing the coding efficiency and without any task-specific constraints. This results imply that efficient coding is a useful strategy for learning structures which allow for making behaviorally vital inferences about the environment.Comment: 22 pages, 9 figure
    corecore