255 research outputs found
Estimation of room acoustic parameters: the ACE challenge
Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR is not normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpus used in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful
Reverberation: models, estimation and application
The use of reverberation models is required in many applications such as acoustic measurements,
speech dereverberation and robust automatic speech recognition. The aim of this thesis is to
investigate different models and propose a perceptually-relevant reverberation model with suitable
parameter estimation techniques for different applications.
Reverberation can be modelled in both the time and frequency domain. The model parameters
give direct information of both physical and perceptual characteristics. These characteristics
create a multidimensional parameter space of reverberation, which can be to a large extent captured
by a time-frequency domain model. In this thesis, the relationship between physical and perceptual
model parameters will be discussed. In the first application, an intrusive technique is proposed to
measure the reverberation or reverberance, perception of reverberation and the colouration. The
room decay rate parameter is of particular interest.
In practical applications, a blind estimate of the decay rate of acoustic energy in a room
is required. A statistical model for the distribution of the decay rate of the reverberant signal
named the eagleMax distribution is proposed. The eagleMax distribution describes the reverberant
speech decay rates as a random variable that is the maximum of the room decay rates and anechoic
speech decay rates. Three methods were developed to estimate the mean room decay rate from
the eagleMax distributions alone. The estimated room decay rates form a reverberation model that
will be discussed in the context of room acoustic measurements, speech dereverberation and robust
automatic speech recognition individually
A binaural grouping model for predicting speech intelligibility in multitalker environments
Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH
- …