255 research outputs found

    Estimation of room acoustic parameters: the ACE challenge

    No full text
    Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR is not normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpus used in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful

    Reverberation: models, estimation and application

    No full text
    The use of reverberation models is required in many applications such as acoustic measurements, speech dereverberation and robust automatic speech recognition. The aim of this thesis is to investigate different models and propose a perceptually-relevant reverberation model with suitable parameter estimation techniques for different applications. Reverberation can be modelled in both the time and frequency domain. The model parameters give direct information of both physical and perceptual characteristics. These characteristics create a multidimensional parameter space of reverberation, which can be to a large extent captured by a time-frequency domain model. In this thesis, the relationship between physical and perceptual model parameters will be discussed. In the first application, an intrusive technique is proposed to measure the reverberation or reverberance, perception of reverberation and the colouration. The room decay rate parameter is of particular interest. In practical applications, a blind estimate of the decay rate of acoustic energy in a room is required. A statistical model for the distribution of the decay rate of the reverberant signal named the eagleMax distribution is proposed. The eagleMax distribution describes the reverberant speech decay rates as a random variable that is the maximum of the room decay rates and anechoic speech decay rates. Three methods were developed to estimate the mean room decay rate from the eagleMax distributions alone. The estimated room decay rates form a reverberation model that will be discussed in the context of room acoustic measurements, speech dereverberation and robust automatic speech recognition individually

    A binaural grouping model for predicting speech intelligibility in multitalker environments

    Get PDF
    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH
    • …
    corecore