56 research outputs found

    Joint estimation of reverberation time and early-to-late reverberation ratio from single-channel speech signals

    Get PDF
    The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters commonly used to characterize acoustic room environments. In contrast to conventional blind estimation methods that process the two parameters separately, we propose a model for joint estimation to predict the RT and the ELR simultaneously from single-channel speech signals from either full-band or sub-band frequency data, which is referred to as joint room parameter estimator (jROPE). An artificial neural network is employed to learn the mapping from acoustic observations to the RT and the ELR classes. Auditory-inspired acoustic features obtained by temporal modulation filtering of the speech time-frequency representations are used as input for the neural network. Based on an in-depth analysis of the dependency between the RT and the ELR, a two-dimensional (RT, ELR) distribution with constrained boundaries is derived, which is then exploited to evaluate four different configurations for jROPE. Experimental results show that-in comparison to the single-task ROPE system which individually estimates the RT or the ELR-jROPE provides improved results for both tasks in various reverberant and (diffuse) noisy environments. Among the four proposed joint types, the one incorporating multi-task learning with shared input and hidden layers yields the best estimation accuracies on average. When encountering extreme reverberant conditions with RTs and ELRs lying beyond the derived (RT, ELR) distribution, the type considering RT and ELR as a joint parameter performs robustly, in particular. From state-of-the-art algorithms that were tested in the acoustic characterization of environments challenge, jROPE achieves comparable results among the best for all individual tasks (RT and ELR estimation from full-band and sub-band signals)

    Exploring auditory-inspired acoustic features for room acoustic parameter estimation from monaural speech

    Get PDF
    Room acoustic parameters that characterize acoustic environments can help to improve signal enhancement algorithms such as for dereverberation, or automatic speech recognition by adapting models to the current parameter set. The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters. In this paper, we propose a blind ROom Parameter Estimator (ROPE) based on an artificial neural network that learns the mapping to discrete ranges of the RT and the ELR from single-microphone speech signals. Auditory-inspired acoustic features are used as neural network input, which are generated by a temporal modulation filter bank applied to the speech time-frequency representation. ROPE performance is analyzed in various reverberant environments in both clean and noisy conditions for both fullband and subband RT and ELR estimations. The importance of specific temporal modulation frequencies is analyzed by evaluating the contribution of individual filters to the ROPE performance. Experimental results show that ROPE is robust against different variations caused by room impulse responses (measured versus simulated), mismatched noise levels, and speech variability reflected through different corpora. Compared to state-of-the-art algorithms that were tested in the acoustic characterisation of environments (ACE) challenge, the ROPE model is the only one that is among the best for all individual tasks (RT and ELR estimation from fullband and subband signals). Improved fullband estimations are even obtained by ROPE when integrating speech-related frequency subbands. Furthermore, the model requires the least computational resources with a real time factor that is at least two times faster than competing algorithms. Results are achieved with an average observation window of 3 s, which is important for real-time applications

    Estimation of room acoustic parameters: the ACE challenge

    No full text
    Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR is not normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpus used in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful

    Spatial features of reverberant speech: estimation and application to recognition and diarization

    Get PDF
    Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced. Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces

    Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model

    Get PDF
    The direct-to-reverberant ratio (DRR), which describes the energy ratio between the direct and reverberant component of a soundfield, is an important parameter in many audio applications. In this paper, we present a multichannel algorithm, which utilizes the blind recordings of a spherical microphone array to estimate the DRR of interest. The algorithm is developed based on a spatial correlation model formulated in the spherical harmonics domain. This model expresses the cross correlation matrix of the recorded soundfield coefficients in terms of two spatial correlation matrices, one for direct sound and the other for reverberation. While the direct path arrives from the source, the reverberant path is considered to be a nondiffuse soundfield with varying directional gains. The direct and reverberant sound energies are estimated from the aforementioned spatial correlation model, which then leads to the DRR estimation. The practical feasibility of the proposed algorithm was evaluated using the speech corpus of the acoustic characterization of environments challenge. The experimental results revealed that the proposed method was able to effectively estimate the DRR of a large collection of reverberant speech recordings including various environmental noise types, room types and speakers.DP14010341
    • …
    corecore