1,825 research outputs found
Effects of reverberation conditions and physical versus virtual source placement on localization in virtual sound environments
Sound field synthesis systems vary in number and arrangement of loudspeakers and methods used to generate virtual sound environments to study human hearing perception. While previous work has evaluated the accuracy with which these systems physically reproduce room acoustic conditions, less is known on assessing subjective perception of those conditions, such as how well such systems preserve source localization. This work quantifies the accuracy and precision of perceived localization from a multi-channel sound field synthesis system at Boys Town National Research Hospital, which used 24 physical loudspeakers and vector-based amplitude panning to generate sound fields. Short bursts of broadband speech-shaped noise were presented from source locations (either coinciding with a physical loudspeaker location, or panned between loudspeakers) under free-field and modeled reverberant-room conditions. Listeners used a HTC Vive remote laser tracking system to point to the perceived source location.Results show that the system synthesizes source locations accurately for both physical and panned sources, in both azimuth and elevation. Panned sources, though, are localized less precisely than physical sources. Reverberant condition is also found to affect both the accuracy and precision of localization in the azimuthal plane, with dry conditions producing greater accuracy and better precision. Only accuracy (not precision) of localization in elevation was impacted by reverberant condition, with reverberant cases producing results closer to the target than dry cases. An interaction effect of reverberant condition with elevation on localization in elevation, though, indicates that dry conditions result in better localization in elevation than reverberant ones at an elevation close to head height, but the situations at higher elevations are where subjects localized dry sources lower than the target height, while reverberant ones were more accurately placed. Other laboratories with sound field synthesis systems are encouraged to gather similar data on the accuracy and precision of localization in azimuth and elevation, so that results from studies using these systems can be better interpreted in light of the capabilities of the system to generate accurate and precise reproductions of source locations. [Work supported by NIH GM109023.]
Advisor: Lily M. Wan
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
We present a novel learning-based approach to estimate the
direction-of-arrival (DOA) of a sound source using a convolutional recurrent
neural network (CRNN) trained via regression on synthetic data and Cartesian
labels. We also describe an improved method to generate synthetic data to train
the neural network using state-of-the-art sound propagation algorithms that
model specular as well as diffuse reflections of sound. We compare our model
against three other CRNNs trained using different formulations of the same
problem: classification on categorical labels, and regression on spherical
coordinate labels. In practice, our model achieves up to 43% decrease in
angular error over prior methods. The use of diffuse reflection results in 34%
and 41% reduction in angular prediction errors on LOCATA and SOFA datasets,
respectively, over prior methods based on image-source methods. Our method
results in an additional 3% error reduction over prior schemes that use
classification based networks, and we use 36% fewer network parameters
PSD Estimation of Multiple Sound Sources in a Reverberant Room Using a Spherical Microphone Array
We propose an efficient method to estimate source power spectral densities
(PSDs) in a multi-source reverberant environment using a spherical microphone
array. The proposed method utilizes the spatial correlation between the
spherical harmonics (SH) coefficients of a sound field to estimate source PSDs.
The use of the spatial cross-correlation of the SH coefficients allows us to
employ the method in an environment with a higher number of sources compared to
conventional methods. Furthermore, the orthogonality property of the SH basis
functions saves the effort of designing specific beampatterns of a conventional
beamformer-based method. We evaluate the performance of the algorithm with
different number of sources in practical reverberant and non-reverberant rooms.
We also demonstrate an application of the method by separating source signals
using a conventional beamformer and a Wiener post-filter designed from the
estimated PSDs.Comment: Accepted for WASPAA 201
Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks
The propagation of sound in a shallow water environment is characterized by
boundary reflections from the sea surface and sea floor. These reflections
result in multiple (indirect) sound propagation paths, which can degrade the
performance of passive sound source localization methods. This paper proposes
the use of convolutional neural networks (CNNs) for the localization of sources
of broadband acoustic radiated noise (such as motor vessels) in shallow water
multipath environments. It is shown that CNNs operating on cepstrogram and
generalized cross-correlogram inputs are able to more reliably estimate the
instantaneous range and bearing of transiting motor vessels when the source
localization performance of conventional passive ranging methods is degraded.
The ensuing improvement in source localization performance is demonstrated
using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap
with arXiv:1612.0350
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
- …