46 research outputs found

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

    Multi-Channel Speech Dereverberation based on a Statistical Model of Late Reverberation

    No full text

    Single-Channel Speech Dereverberation based on Spectral Subtraction

    No full text
    Speech signals recorded with a distant microphone usually contain reverberation, which degrades the fidelity and intelligibility of speech in devices such as 'hands-free' conference telephones, automatic speech recognition and hearing aids. One important effect of reverberation on speech is overlap-masking, i.e. the energy of the previous phonemes is smeared over time, and overlaps following phonemes. In [1] a single-channel speech dereverberation method based on Spectral Subtraction was introduced to reduce this effect. The described method estimates the power spectrum of the reverberation based on a statistical model of late reverberation. This model depends on one parameter, the reverberation time. However, the reverberation time is frequency dependent due to requency dependent reflection coeficients of walls and other objects and the frequency dependent absorption coeficient of air. In this paper, we have taken this dependency into account and studied the effect on reverberation reduction and distortion. The algorithm is tested using synthetically reverberated signals. The performances for different room impulse responses with reverberation times ranging from approximately 200 to 1200 ms show significant reverberation reduction with little signal distortion

    Single-Channel Speech Dereverberation based on Spectral Subtraction

    No full text
    Speech signals recorded with a distant microphone usually contain reverberation, which degrades the fidelity and intelligibility of speech in devices such as 'hands-free' conference telephones, automatic speech recognition and hearing aids. One important effect of reverberation on speech is overlap-masking, i.e. the energy of the previous phonemes is smeared over time, and overlaps following phonemes. In [1] a single-channel speech dereverberation method based on Spectral Subtraction was introduced to reduce this effect. The described method estimates the power spectrum of the reverberation based on a statistical model of late reverberation. This model depends on one parameter, the reverberation time. However, the reverberation time is frequency dependent due to requency dependent reflection coeficients of walls and other objects and the frequency dependent absorption coeficient of air. In this paper, we have taken this dependency into account and studied the effect on reverberation reduction and distortion. The algorithm is tested using synthetically reverberated signals. The performances for different room impulse responses with reverberation times ranging from approximately 200 to 1200 ms show significant reverberation reduction with little signal distortion

    Insight into linear periodically time-varying coherence reduction methods for stereophonic acoustic echo cancellation

    No full text
    The non-uniqueness problem of multi-channel acoustic echo cancellation can be solved by reducing the coherence between the loudspeaker signals. To this end, several coherence reduction methods have been proposed that aim at providing the best possible trade-off between signal decorrelation and degradation of the subjective audio quality. Among these, the periodically time-varying ones introduce, as the name indicates, a periodicity to the signals' statistical properties. These properties are here analysed to provide further insight into the effective coherence reduction achieved by these particular methods, and, thereafter, to predict the performance enhancement of a stereophonic acoustic echo canceller

    Low-Complexity Multi-Microphone Acoustic Echo Control in the Short-Time Fourier Transform Domain

    No full text
    Many modern communication and smart devices are equipped with several microphones, in addition to one or more loudspeakers. Each microphone not only acquires sounds produced in the near-end room, i.e., desired near-end speech, background noise, and other interferences, but also a far-end signal that is reproduced by the loudspeaker(s). This particular type of acoustic coupling, commonly denoted as acoustic echo, can be reduced in a distortionless manner by employing multi-microphone acoustic echo cancellation (MM-AEC) techniques. However, under noisy conditions, the performance of AEC is limited by the echo-to-noise ratio, and additional echo reduction is needed. Further, to ensure high-quality end-to-end communication in noisy environments, background noise has to be reduced as well. To achieve the latter, multi-microphone speech enhancement techniques, such as beamforming (BF), are often used as they are capable of reducing undesired signal components while causing little distortion to the desired near-end speech. In spite of its high computational cost, the most effective solution to reduce acoustic echoes and background noise is to cascade MM-AEC and BF. In this work, a low-complexity multi-microphone echo controller is introduced, which not only combines low-complexity MM-AEC with BF, but also integrates residual echo reduction into the beamformer design

    On the Numerical Instability of an LCMV Beamformer for a Uniform Linear Array

    No full text
    We analyze the conditions for numerical instability in the solution of a linearly constrained minimum variance (LCMV) beamformer with multiple directional constraints for a uniform linear array. An analytic expression is presented to determine the frequencies (for broadband signals such as speech) where the inverse term in the solution of the LCMV beamformer does not exist. Simulation results and power patterns are provided to further illustrate the problem. In addition, we investigate and discuss possible solutions to the problem
    corecore