9,175 research outputs found

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    Indirect Image Registration with Large Diffeomorphic Deformations

    Full text link
    The paper adapts the large deformation diffeomorphic metric mapping framework for image registration to the indirect setting where a template is registered against a target that is given through indirect noisy observations. The registration uses diffeomorphisms that transform the template through a (group) action. These diffeomorphisms are generated by solving a flow equation that is defined by a velocity field with certain regularity. The theoretical analysis includes a proof that indirect image registration has solutions (existence) that are stable and that converge as the data error tends so zero, so it becomes a well-defined regularization method. The paper concludes with examples of indirect image registration in 2D tomography with very sparse and/or highly noisy data.Comment: 43 pages, 4 figures, 1 table; revise

    The Effelsberg-Bonn HI Survey: Data reduction

    Full text link
    Starting in winter 2008/2009 an L-band 7-Feed-Array receiver is used for a 21-cm line survey performed with the 100-m telescope, the Effelsberg-Bonn HI survey (EBHIS). The EBHIS will cover the whole northern hemisphere for decl.>-5 deg comprising both the galactic and extragalactic sky out to a distance of about 230 Mpc. Using state-of-the-art FPGA-based digital fast Fourier transform spectrometers, superior in dynamic range and temporal resolution to conventional correlators, allows us to apply sophisticated radio frequency interference (RFI) mitigation schemes. In this paper, the EBHIS data reduction package and first results are presented. The reduction software consists of RFI detection schemes, flux and gain-curve calibration, stray-radiation removal, baseline fitting, and finally the gridding to produce data cubes. The whole software chain is successfully tested using multi-feed data toward many smaller test fields (1--100 square degrees) and recently applied for the first time to data of two large sky areas, each covering about 2000 square degrees. The first large area is toward the northern galactic pole and the second one toward the northern tip of the Magellanic Leading Arm. Here, we demonstrate the data quality of EBHIS Milky Way data and give a first impression on the first data release in 2011.Comment: 17 pages, 14 figures; to be published in ApJ

    Video based vehicle detection for advance warning Intelligent Transportation System

    Full text link
    Video based vehicle detection and surveillance technologies are an integral part of Intelligent Transportation System (ITS), due to its non-intrusiveness and capability or capturing global and specific vehicle behavior data. The initial goal of this thesis is to develop an efficient advance warning ITS system for detection of congestion at work zones and special events based on video detection. The goals accomplished by this thesis are: (1) successfully developed the advance warning ITS system using off-the-shelf components and, (2) Develop and evaluate an improved vehicle detection and tracking algorithm. The advance warning ITS system developed includes many off-the-shelf equipments like Autoscope (video based vehicle detector), Digital Video Recorders, RF transceivers, high gain Yagi antennas, variable message signs and interface processors. The video based detection system used requires calibration and fine tuning of configuration parameters for accurate results. Therefore, an in-house video based vehicle detection system was developed using the Corner Harris algorithm to eliminate the need of complex calibration and contrasts modifications. The algorithm was implemented using OpenCV library on a Arcom\u27s Olympus Windows XP Embedded development kit running WinXPE operating system. The algorithm performance is for accuracy in vehicle speed and count is evaluated. The performance of the proposed algorithm is equivalent or better to the Autoscope system without any modifications to calibration and lamination adjustments

    Temporal Dynamics of Decision-Making during Motion Perception in the Visual Cortex

    Get PDF
    How does the brain make decisions? Speed and accuracy of perceptual decisions covary with certainty in the input, and correlate with the rate of evidence accumulation in parietal and frontal cortical "decision neurons." A biophysically realistic model of interactions within and between Retina/LGN and cortical areas V1, MT, MST, and LIP, gated by basal ganglia, simulates dynamic properties of decision-making in response to ambiguous visual motion stimuli used by Newsome, Shadlen, and colleagues in their neurophysiological experiments. The model clarifies how brain circuits that solve the aperture problem interact with a recurrent competitive network with self-normalizing choice properties to carry out probablistic decisions in real time. Some scientists claim that perception and decision-making can be described using Bayesian inference or related general statistical ideas, that estimate the optimal interpretation of the stimulus given priors and likelihoods. However, such concepts do not propose the neocortical mechanisms that enable perception, and make decisions. The present model explains behavioral and neurophysiological decision-making data without an appeal to Bayesian concepts and, unlike other existing models of these data, generates perceptual representations and choice dynamics in response to the experimental visual stimuli. Quantitative model simulations include the time course of LIP neuronal dynamics, as well as behavioral accuracy and reaction time properties, during both correct and error trials at different levels of input ambiguity in both fixed duration and reaction time tasks. Model MT/MST interactions compute the global direction of random dot motion stimuli, while model LIP computes the stochastic perceptual decision that leads to a saccadic eye movement.National Science Foundation (SBE-0354378, IIS-02-05271); Office of Naval Research (N00014-01-1-0624); National Institutes of Health (R01-DC-02852
    • …
    corecore