993 research outputs found

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion

    A NEW SPEECH ENHANCEMENT TECHNIQUE USING PERCEPTUAL CONSTRAINED SPECTRAL WEIGHTING FACTORS

    Get PDF
    This paper deals with musical noise result from perceptual speech enhancement type algorithms and especially wiener filtering. Although perceptual speech enhancement methods perform better than the non perceptual methods, most of them still return annoying residual musical noise. This is due to the fact that if only noise above the noise masking threshold is filtered then noise below the noise masking threshold can become audible if its maskers are filtered. It can affect the performance of perceptual speech enhancement method that process audible noise only. In order to overcome this drawback here proposed a new speech enhancement technique. It aims to improve the quality of the enhanced speech signal provided by perceptual wiener filtering by controlling the latter via a second filter regarded as a psychoacoustically motivated weighting factor. The simulation results shows that the performance is improved compared to other perceptual speech enhancement method

    A Novel Scheme of Speech Enhancement using Power Spectral Subtraction - Multi-Layer Perceptron Network

    Get PDF
    A novel method for eliminating noise from a noised speech signal in order to improve its quality using combined power spectral subtraction and multi-layer perceptron network is presented in this paper. Firstly, the contaminated speech signal was processed by spectral subtraction to enhance the clean speech signal. Then, the signal was processed by a neural network using the spectral subtraction parameters and result of estimated speech signal in order to improve its signal quality and intelligibility. The artificial neural network used was multi-layer perceptron network consisted of three layers with six input and one output. The neural network was trained with three speech signals contaminated with two level white gaussian noises in SNR including 0 dB and 30dB. The designed speech enhancement was examined with ten noised speech signals. Based on the experiments, the improvement of signal quality SNR was up to 7 dB when the signal quality input was 0dB. Then, based on the PESQ score, the proposed method can improve up to 0.4 from its origin value. Those experiment results show that the proposed method is capable to improve both the signal quality and intelligibility better than the original power spectral subtraction

    Amélioration psychoacoustique du filtrage de Wiener : quelques approches récentes et une nouvelle méthode

    Get PDF
    *Bruit musical, distorsion, filtre deWiener, psychoacoustique, signal de parol

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    A Study into Speech Enhancement Techniques in Adverse Environment

    Get PDF
    This dissertation developed speech enhancement techniques that improve the speech quality in applications such as mobile communications, teleconferencing and smart loudspeakers. For these applications it is necessary to suppress noise and reverberation. Thus the contribution in this dissertation is twofold: single channel speech enhancement system which exploits the temporal and spectral diversity of the received microphone signal for noise suppression and multi-channel speech enhancement method with the ability to employ spatial diversity to reduce reverberation
    • …
    corecore