1,529 research outputs found

    Blind estimation of room acoustic parameters from speech signals based on extended model of room impulse response

    Full text link
    The speech transmission index (STI) and room acoustic parameters (RAPs), which are derived from a room impulse response (RIR), such as reverberation time and early decay time, are essential to assess speech transmission and to predict the listening difficulty in a sound field. Since it is difficult to measure RIR in daily occupied spaces, simultaneous blind estimation of STI and RAPs must be resolved as it is an imperative and challenging issue. This paper proposes a deterministic method for blindly estimating STI and five RAPs on the basis of an RIR stochastic model that approximates an unknown RIR. The proposed method formulates a temporal power envelope of a reverberant speech signal to obtain the optimal parameters for the RIR model. Simulations were conducted to evaluate STI and RAPs from observed reverberant speech signals. The root-mean-square errors between the estimated and ground-truth results were used to comparatively evaluate the proposed method with the previous method. The results showed that the proposed method can estimate STI and RAPs effectively without any training.Comment: 5-pages, 3 figures, 2 table

    Reverberation: models, estimation and application

    No full text
    The use of reverberation models is required in many applications such as acoustic measurements, speech dereverberation and robust automatic speech recognition. The aim of this thesis is to investigate different models and propose a perceptually-relevant reverberation model with suitable parameter estimation techniques for different applications. Reverberation can be modelled in both the time and frequency domain. The model parameters give direct information of both physical and perceptual characteristics. These characteristics create a multidimensional parameter space of reverberation, which can be to a large extent captured by a time-frequency domain model. In this thesis, the relationship between physical and perceptual model parameters will be discussed. In the first application, an intrusive technique is proposed to measure the reverberation or reverberance, perception of reverberation and the colouration. The room decay rate parameter is of particular interest. In practical applications, a blind estimate of the decay rate of acoustic energy in a room is required. A statistical model for the distribution of the decay rate of the reverberant signal named the eagleMax distribution is proposed. The eagleMax distribution describes the reverberant speech decay rates as a random variable that is the maximum of the room decay rates and anechoic speech decay rates. Three methods were developed to estimate the mean room decay rate from the eagleMax distributions alone. The estimated room decay rates form a reverberation model that will be discussed in the context of room acoustic measurements, speech dereverberation and robust automatic speech recognition individually

    A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones

    Get PDF
    A non-intrusive method is introduced to predict binaural speech intelligibility in noise directly from signals captured using a pair of microphones. The approach combines signal processing techniques in blind source separation and localisation, with an intrusive objective intelligibility measure (OIM). Therefore, unlike classic intrusive OIMs, this method does not require a clean reference speech signal and knowing the location of the sources to operate. The proposed approach is able to estimate intelligibility in stationary and fluctuating noises, when the noise masker is presented as a point or diffused source, and is spatially separated from the target speech source on a horizontal plane. The performance of the proposed method was evaluated in two rooms. When predicting subjective intelligibility measured as word recognition rate, this method showed reasonable predictive accuracy with correlation coefficients above 0.82, which is comparable to that of a reference intrusive OIM in most of the conditions. The proposed approach offers a solution for fast binaural intelligibility prediction, and therefore has practical potential to be deployed in situations where on-site speech intelligibility is a concern

    Modeling speech intelligibility based on the signal-to-noise envelope power ratio

    Get PDF

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

    Blind estimation of room acoustic parameters from speech and music signals

    Get PDF
    The acoustic character of a space is often quantified using objective room acoustic parameters. The measurement of these parameters is difficult in occupied conditions and thus measurements are usually performed when the space is un-occupied. This is despite the knowledge that occupancy can impact significantly on the measured parameter value. Within this thesis new methods are developed by which naturalistic signals such as speech and music can be used to perform acoustic parameter measurement. Adoption of naturalistic signals enables passive measurement during orchestral performances and spoken announcements, thus facilitating easy in-situ measurement. Two methods are described within this work; (1) a method utilising artificial neural networks where a network is taught to recognise acoustic parameters from received, reverberated signals and (2) a method based on the maximum likelihood estimation of the decay curve of the room from which parameters are then calculated. (1) The development of the neural network method focuses on a new pre-processor for use with music signals. The pre-processor utilises a narrow band filter bank with centre frequencies chosen based on the equal temperament scale. The success of a machine learning method is linked to the quality of the training data and therefore realistic acoustic simulation algorithms were used to generate a large database of room impulse responses. Room models were defined with realistic randomly generated geometries and surface properties; these models were then used to predict the room impulse responses. (2) In the second approach, a statistical model of the decay of sound in a room was further developed. This model uses a maximum likelihood (ML) framework to yield a number of decay curve estimates from a received reverberant signal. The success of the method depends on a number of stages developed for the algorithm; (a) a pre-processor to select appropriate decay phases for estimation purposes, (b) a rigorous optimisation algorithm to ensure the correct maximum likelihood estimate is found and (c) a method to yield a single optimum decay curve estimate from which the parameters are calculated. The ANN and ML methods were tested using orchestral music and speech signals. The ANN method tended to perform well when estimating the early decay time (EDT), for speech and music signals the error was within the subjective difference limens. However, accuracy was reduced for the reverberation time (Rt) and other parameters. By contrast the ML method performed well for Rt with results for both speech and music within the difference limens for reasonable (<4s) reverberation time. In addition reasonable accuracy was found for EDT, Clarity (C80), Centre time (Ts) and Deutichkeit (D). The ML method is also capable of producing accurate estimates of the binaural parameters Early Lateral Energy Fraction (LEF) and the late lateral strength (LG). A number of real world measurements were carried out in concert halls where the ML accuracy was shown to be sufficient for most parameters. The ML method has the advantage over the ANN method due to its truly blind nature (the ANN method requires a period of learning and is therefore semi-blind). The ML method uses gaps of silence between notes or utterances, when these silence regions are not present the method does not produce an estimate. Accurate estimation requires a long recording (hours of music or many minutes of speech) to ensure that at least some silent regions are present. This thesis shows that, given a sufficiently long recording, accurate estimates of many acoustic parameters can be obtained directly from speech and music. Further extensions to the ML method detailed in this thesis combine the ML estimated decay curve with cepstral methods which detect the locations of early reflections. This improves the accuracy of many of the parameter estimates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Non-intrusive intelligibility prediction for Mandarin speech in noise

    Get PDF
    Most existing intelligibility indices require access to the input (clean) reference signal to predict speech intelligibility in noise. In some real-world applications, however, only the noise-masked speech is available, rendering existing indices of little use. The present study assessed the performance of an intelligibility measure that could be used to predict non-intrusively (i.e., with no access to the clean input signal) speech intelligibility in noise using only information extracted from the noise-masked speech envelopes. The proposed intelligibility measure (denoted as ModA) was computed by integrating the area of the modulation spectrum (within 0.5 Hz to 10 Hz) of the noise-masked envelopes extracted in four acoustic bands. The ModA measure was evaluated with intelligibility scores obtained by normal-hearing listeners presented with Mandarin sentences corrupted by three types of maskers. High correlation (r=0.90) was obtained between ModA values and listener’s intelligibility scores, suggesting that the modulation-spectrum area could be potentially used as a simple but efficient predictor of speech intelligibility in noisy conditions.published_or_final_versio
    • …
    corecore