1,427 research outputs found
Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization
Conventional approaches to sound source localization require at least two
microphones. It is known, however, that people with unilateral hearing loss can
also localize sounds. Monaural localization is possible thanks to the
scattering by the head, though it hinges on learning the spectra of the various
sources. We take inspiration from this human ability to propose algorithms for
accurate sound source localization using a single microphone embedded in an
arbitrary scattering structure. The structure modifies the frequency response
of the microphone in a direction-dependent way giving each direction a
signature. While knowing those signatures is sufficient to localize sources of
white noise, localizing speech is much more challenging: it is an ill-posed
inverse problem which we regularize by prior knowledge in the form of learned
non-negative dictionaries. We demonstrate a monaural speech localization
algorithm based on non-negative matrix factorization that does not depend on
sophisticated, designed scatterers. In fact, we show experimental results with
ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures
we can accurately localize arbitrary speakers; that is, we do not need to learn
the dictionary for the particular speaker to be localized. Finally, we discuss
multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM
Transactions on Audio, Speech, and Language processing (TASLP
Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation
To date a number of studies have shown that receptive field shapes of early
sensory neurons can be reproduced by optimizing coding efficiency of natural
stimulus ensembles. A still unresolved question is whether the efficient coding
hypothesis explains formation of neurons which explicitly represent
environmental features of different functional importance. This paper proposes
that the spatial selectivity of higher auditory neurons emerges as a direct
consequence of learning efficient codes for natural binaural sounds. Firstly,
it is demonstrated that a linear efficient coding transform - Independent
Component Analysis (ICA) trained on spectrograms of naturalistic simulated
binaural sounds extracts spatial information present in the signal. A simple
hierarchical ICA extension allowing for decoding of sound position is proposed.
Furthermore, it is shown that units revealing spatial selectivity can be
learned from a binaural recording of a natural auditory scene. In both cases a
relatively small subpopulation of learned spectrogram features suffices to
perform accurate sound localization. Representation of the auditory space is
therefore learned in a purely unsupervised way by maximizing the coding
efficiency and without any task-specific constraints. This results imply that
efficient coding is a useful strategy for learning structures which allow for
making behaviorally vital inferences about the environment.Comment: 22 pages, 9 figure
Recommended from our members
Spectral cues are necessary to encode azimuthal auditory space in the mouse superior colliculus.
Sound localization plays a critical role in animal survival. Three cues can be used to compute sound direction: interaural timing differences (ITDs), interaural level differences (ILDs) and the direction-dependent spectral filtering by the head and pinnae (spectral cues). Little is known about how spectral cues contribute to the neural encoding of auditory space. Here we report on auditory space encoding in the mouse superior colliculus (SC). We show that the mouse SC contains neurons with spatially-restricted receptive fields (RFs) that form an azimuthal topographic map. We found that frontal RFs require spectral cues and lateral RFs require ILDs. The neurons with frontal RFs have frequency tunings that match the spectral structure of the specific head and pinna filter for sound coming from the front. These results demonstrate that patterned spectral cues in combination with ILDs give rise to the topographic map of azimuthal auditory space
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
This paper addresses the problem of localizing audio sources using binaural
measurements. We propose a supervised formulation that simultaneously localizes
multiple sources at different locations. The approach is intrinsically
efficient because, contrary to prior work, it relies neither on source
separation, nor on monaural segregation. The method starts with a training
stage that establishes a locally-linear Gaussian regression model between the
directional coordinates of all the sources and the auditory features extracted
from binaural measurements. While fixed-length wide-spectrum sounds (white
noise) are used for training to reliably estimate the model parameters, we show
that the testing (localization) can be extended to variable-length
sparse-spectrum sounds (such as speech), thus enabling a wide range of
realistic applications. Indeed, we demonstrate that the method can be used for
audio-visual fusion, namely to map speech signals onto images and hence to
spatially align the audio and visual modalities, thus enabling to discriminate
between speaking and non-speaking faces. We release a novel corpus of real-room
recordings that allow quantitative evaluation of the co-localization method in
the presence of one or two sound sources. Experiments demonstrate increased
accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure
Perception of vertically separated sound sources in the median plane
The ability of human listeners to segregate two sound sources was examined by conducting an experiment when the sources are concurrently presented from different directions in the median plane. A high-pass filtered pink noise was utilized as a sound stimulus in a free-field condition and presented as either a pair of incoherent sound sources or a single-source. Subjects responded whether they perceived sound from one or two directions. Listening tests were conducted with different directions and separation angles of sound sources. These tests consisted of two sessions: a monaural session when only the right ear was made audible, and a binaural session when both ears were audible. The results indicated that the percentage of responding "two directions" for pairwise stimuli exceeded 50% above 33.75 deg. separation angle and reached above 70% at 67.5 deg. separation for both sessions. However, the perceived separation showed weak correlation to the degree of separation although it increased in the binaural session. The ability to discriminate pairwise stimuli to each of two corresponding sound sources showed high statistical significance. The difference between a monaural hearing and binaural hearing was not statistically significant for the segregation of sound sources in the median plane
- …