10,739 research outputs found
Effects of virtual acoustics on dynamic auditory distance perception
Sound propagation encompasses various acoustic phenomena including
reverberation. Current virtual acoustic methods, ranging from parametric
filters to physically-accurate solvers, can simulate reverberation with varying
degrees of fidelity. We investigate the effects of reverberant sounds generated
using different propagation algorithms on acoustic distance perception, i.e.,
how faraway humans perceive a sound source. In particular, we evaluate two
classes of methods for real-time sound propagation in dynamic scenes based on
parametric filters and ray tracing. Our study shows that the more accurate
method shows less distance compression as compared to the approximate,
filter-based method. This suggests that accurate reverberation in VR results in
a better reproduction of acoustic distances. We also quantify the levels of
distance compression introduced by different propagation methods in a virtual
environment.Comment: 8 Pages, 7 figure
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
Blind estimation of reverberation time in classrooms and hospital wards
This paper investigates blind Reverberation Time (RT) estimation in occupied classrooms and hospital wards. Measurements are usually made while these spaces are unoccupied for logistical reasons. However, occupancy can have a significant impact on the rate of reverberant decay.
Recent work has developed a Maximum Likelihood Estimation (MLE) method which utilises only passively recorded speech and music signals, this enables measurements to be made while the room is in use. In this paper the MLE method is applied to recordings made in classrooms during lessons.
Classroom occupancy levels differ for each lesson, therefore a model is developed using blind estimates to predict the RT for any occupancy level to within ±0.07s for the mid-frequency octave bands. The model is also able to predict the effective room and per person absorption area.
Ambient sound recordings were also carried out in a number of rooms in two hospitals for a week.
Hospital measurements are more challenging as the occurrence of free reverberant decay is rarer than in schools and the acoustic conditions may be non-stationary. However, by gaining recordings over a period of a week, estimates can be gained within ±0.07 s. These estimates are representative of the times when the room contains the highest acoustic absorption. In other words when curtains
are drawn, there are many visitors or perhaps a window may be open
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
Raking the Cocktail Party
We present the concept of an acoustic rake receiver---a microphone beamformer
that uses echoes to improve the noise and interference suppression. The rake
idea is well-known in wireless communications; it involves constructively
combining different multipath components that arrive at the receiver antennas.
Unlike spread-spectrum signals used in wireless communications, speech signals
are not orthogonal to their shifts. Therefore, we focus on the spatial
structure, rather than temporal. Instead of explicitly estimating the channel,
we create correspondences between early echoes in time and image sources in
space. These multiple sources of the desired and the interfering signal offer
additional spatial diversity that we can exploit in the beamformer design.
We present several "intuitive" and optimal formulations of acoustic rake
receivers, and show theoretically and numerically that the rake formulation of
the maximum signal-to-interference-and-noise beamformer offers significant
performance boosts in terms of noise and interference suppression. Beyond
signal-to-noise ratio, we observe gains in terms of the \emph{perceptual
evaluation of speech quality} (PESQ) metric for the speech quality. We
accompany the paper by the complete simulation and processing chain written in
Python. The code and the sound samples are available online at
\url{http://lcav.github.io/AcousticRakeReceiver/}.Comment: 12 pages, 11 figures, Accepted for publication in IEEE Journal on
Selected Topics in Signal Processing (Special Issue on Spatial Audio
- …