Search CORE

1,427 research outputs found

Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

Author: Badawy Dalia El
Dokmanić Ivan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/07/2018
Field of study

Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language processing (TASLP

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation

Author: Mlynarski Wiktor
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial selectivity of higher auditory neurons emerges as a direct consequence of learning efficient codes for natural binaural sounds. Firstly, it is demonstrated that a linear efficient coding transform - Independent Component Analysis (ICA) trained on spectrograms of naturalistic simulated binaural sounds extracts spatial information present in the signal. A simple hierarchical ICA extension allowing for decoding of sound position is proposed. Furthermore, it is shown that units revealing spatial selectivity can be learned from a binaural recording of a natural auditory scene. In both cases a relatively small subpopulation of learned spectrogram features suffices to perform accurate sound localization. Representation of the auditory space is therefore learned in a purely unsupervised way by maximizing the coding efficiency and without any task-specific constraints. This results imply that efficient coding is a useful strategy for learning structures which allow for making behaviorally vital inferences about the environment.Comment: 22 pages, 9 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Recommended from our members

Spectral cues are necessary to encode azimuthal auditory space in the mouse superior colliculus.

Author: Feldheim David A
Ito Shinya
Litke Alan M
Si Yufei
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

Sound localization plays a critical role in animal survival. Three cues can be used to compute sound direction: interaural timing differences (ITDs), interaural level differences (ILDs) and the direction-dependent spectral filtering by the head and pinnae (spectral cues). Little is known about how spectral cues contribute to the neural encoding of auditory space. Here we report on auditory space encoding in the mouse superior colliculus (SC). We show that the mouse SC contains neurons with spatially-restricted receptive fields (RFs) that form an azimuthal topographic map. We found that frontal RFs require spectral cues and lateral RFs require ILDs. The neurons with frontal RFs have frequency tunings that match the spectral structure of the specific head and pinna filter for sound coming from the front. These results demonstrate that patterned spectral cues in combination with ILDs give rise to the topographic map of azimuthal auditory space

eScholarship - University of California

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

Author: Deleforge Antoine
Girin Laurent
Horaud Radu
Schechner Yoav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2015
Field of study

This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Perception of vertically separated sound sources in the median plane

Author: Kim Tae
Publication venue
Publication date: 11/12/2017
Field of study

The ability of human listeners to segregate two sound sources was examined by conducting an experiment when the sources are concurrently presented from different directions in the median plane. A high-pass filtered pink noise was utilized as a sound stimulus in a free-field condition and presented as either a pair of incoherent sound sources or a single-source. Subjects responded whether they perceived sound from one or two directions. Listening tests were conducted with different directions and separation angles of sound sources. These tests consisted of two sessions: a monaural session when only the right ear was made audible, and a binaural session when both ears were audible. The results indicated that the percentage of responding "two directions" for pairwise stimuli exceeded 50% above 33.75 deg. separation angle and reached above 70% at 67.5 deg. separation for both sessions. However, the perceived separation showed weak correlation to the degree of separation although it increased in the binaural session. The ability to discriminate pairwise stimuli to each of two corresponding sound sources showed high statistical significance. The difference between a monaural hearing and binaural hearing was not statistically significant for the segregation of sound sources in the median plane

Aaltodoc Publication Archive