9 research outputs found

    A Bayesian computational basis for auditory selective attention using head rotation and the interaural time-difference cue

    Get PDF
    <div><p>The process of resolving mixtures of several sounds into their separate individual streams is known as auditory scene analysis and it remains a challenging task for computational systems. It is well-known that animals use binaural differences in arrival time and intensity at the two ears to find the arrival angle of sounds in the azimuthal plane, and this localization function has sometimes been considered sufficient to enable the un-mixing of complex scenes. However, the ability of such systems to resolve distinct sound sources in both space and frequency remains limited. The neural computations for detecting interaural time difference (ITD) have been well studied and have served as the inspiration for computational auditory scene analysis systems, however a crucial limitation of ITD models is that they produce ambiguous or “phantom” images in the scene. This has been thought to limit their usefulness at frequencies above about 1khz in humans. We present a simple Bayesian model and an implementation on a robot that uses ITD information recursively. The model makes use of head rotations to show that ITD information is sufficient to unambiguously resolve sound sources in both space and frequency. Contrary to commonly held assumptions about sound localization, we show that the ITD cue used with high-frequency sound can provide accurate and unambiguous localization and resolution of competing sounds. Our findings suggest that an “active hearing” approach could be useful in robotic systems that operate in natural, noisy settings. We also suggest that neurophysiological models of sound localization in animals could benefit from revision to include the influence of top-down memory and sensorimotor integration across head rotations.</p></div

    Resolution of two talkers separated by 45°.

    No full text
    <p>(A) Spectral-spatial map of acoustic scene containing a female and male speaker at 0° and 45° after 9 rotations. White dashed lines indicate the true location of sources. (B) Spatial map of the acoustic scene shows two distinct peaks corresponding to the two talkers. Filled regions under peaks indicate the extent of the beam patterns used to estimate spectra in panel C. (C) Plots of the difference between the average spectrum of the two talkers (Left) and the difference between the activity in the spectral-spatial map at the peaks shown in panel B near 45° (S<sub>B2</sub>) and 0° (S<sub>B1</sub>). (D) Correlation between difference in the spectra of the target acoustic streams and the difference between the spectra of the two most prominent spatial peaks (blue) and the total localization error between targets and peaks (red) over successive rotations.</p

    Resolution of two complex tones separated by 45°.

    No full text
    <p>(A) Spectral-spatial map of acoustic scene containing tone complexes at 0° and 45° after 9 rotations. White dashed lines indicate the true location of sources. (B) Spatial map of the acoustic scene shows two distinct peaks corresponding to the two tone complexes. Filled regions under peaks indicate the extent of the beam patterns used to estimate spectra in panel C. (C) Plots of the difference between the average spectrum of the two tone complexes (Left) and the difference between the activity in the spectral-spatial map at the peaks shown in panel B near 45° (S<sub>B2</sub>) and 0° (S<sub>B1</sub>). (D) Correlation between difference in the spectra of the target acoustic streams and the difference between the spectra of the two most prominent spatial peaks (blue) and the total localization error between targets and peaks (red) over successive rotations.</p

    Resolution of broadband noise and a pure tone.

    No full text
    <p>(A) Spectral-spatial map of acoustic scene containing broadband noise at 0° and a 2400 Hz tone at 11° after 9 rotations. White dashed lines indicate the true location of sources. (B) Spatial map of the acoustic scene shows two distinct peaks corresponding to the noise and tone, respectively. Filled regions under peaks indicate the extent of the beam patterns used to estimate spectra in panel C. (C) Plots of the difference between the average spectrum of the tone (S<sub>T2</sub>) and broadband noise (S<sub>T1</sub>) (Left) and the difference between the activity in the spectral-spatial map at the peaks shown in panel B near 11° (S<sub>B2</sub>) and 0° (S<sub>B1</sub>). (D) Correlation between difference in the spectra of the target acoustic streams and the difference between the spectra of the two most prominent spatial peaks (blue) and the total localization error between targets and peaks (red) over successive rotations.</p

    Modelled beamformer response to pure tones at varying azimuth angles in egocentric space.

    No full text
    <p>Response to a modelled sound source located at 0° in allocentric space from a set of narrow-band beamformers oriented at various angles in egocentric space obtained from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0186104#pone.0186104.e001" target="_blank">Eq 1</a>. At low frequencies, peak activation occurs over broad arcs; multiplying evidence distributions across successive rotations results in a reduced localization uncertainty. At high frequencies multiple peaks are eliminated by multiplying across rotations effectively resolving ambiguity.</p

    Illustration of the EPSP transform and its effect on beamformer results.

    No full text
    <p>Example of the transformation of a sound signal filtered at 500 Hz (top panel) into an excitatory postsynaptic potential (EPSP) signal (middle panel). The EPSP transform results in a relatively narrower beamformer-array output that is stable with respect to frequency of the filter pass-band (bottom panel).</p

    Accuracy and precision of localization is better at high frequencies.

    No full text
    <p>Spectral-spatial maps for acoustic scenes with broadband noise at 0° and tones at 11° or 22° were obtained over 9 rotations. Localization error, illustrating the limits of system’s accuracy, and the width of the spatial peak, illustrating the limits of the system’s precision, after 9 rotations are plotted against the frequency of the target tone.</p

    Resolution of two complex tones spatially separated by 22°.

    No full text
    <p>(A) Spectral-spatial map of acoustic scene containing tone complexes at 0° and 22° after 9 rotations. White dashed lines indicate the true location of sources. (B) Spatial map of the acoustic scene shows two distinct peaks corresponding to the two tone complexes. Filled regions under peaks indicate the extent of the beam patterns used to estimate spectra in panel C. (C) Plots of the difference between the average spectrum of the two tone complexes (Left) and the difference between the activity in the spectral-spatial map at the peaks shown in panel B near 22° (S<sub>B2</sub>) and 0° (S<sub>B1</sub>). (D) Correlation between difference in the spectra of the target acoustic streams and the difference between the spectra of the two most prominent spatial peaks (blue) and the total localization error between targets and peaks (red) over successive rotations.</p

    Localization becomes unambiguous over several rotations.

    No full text
    <p>Response of a set of narrow-band beamformers to broadband noise at 0° and a 2400 Hz tone at 11° in allocentric space product integrated over several rotations.</p
    corecore