298 research outputs found

    Algorithm and architecture for simultaneous diagonalization of matrices applied to subspace-based speech enhancement

    Get PDF
    This thesis presents algorithm and architecture for simultaneous diagonalization of matrices. As an example, a subspace-based speech enhancement problem is considered, where in the covariance matrices of the speech and noise are diagonalized simultaneously. In order to compare the system performance of the proposed algorithm, objective measurements of speech enhancement is shown in terms of the signal to noise ratio and mean bark spectral distortion at various noise levels. In addition, an innovative subband analysis technique for subspace-based time-domain constrained speech enhancement technique is proposed. The proposed technique analyses the signal in its subbands to build accurate estimates of the covariance matrices of speech and noise, exploiting the inherent low varying characteristics of speech and noise signals in narrow bands. The subband approach also decreases the computation time by reducing the order of the matrices to be simultaneously diagonalized. Simulation results indicate that the proposed technique performs well under extreme low signal-to-noise-ratio conditions. Further, an architecture is proposed to implement the simultaneous diagonalization scheme. The architecture is implemented on an FPGA primarily to compare the performance measures on hardware and the feasibility of the speech enhancement algorithm in terms of resource utilization, throughput, etc. A Xilinx FPGA is targeted for implementation. FPGA resource utilization re-enforces on the practicability of the design. Also a projection of the design feasibility for an ASIC implementation in terms of transistor count only is include

    Multi-channel dereverberation for speech intelligibility improvement in hearing aid applications

    Get PDF

    <strong>Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation</strong>

    Get PDF

    Blind Subband Beamforming With Time-Delay Constraints for Moving Source Speech Enhancement

    Full text link

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion

    SkipConvGAN: Monaural Speech Dereverberation using Generative Adversarial Networks via Complex Time-Frequency Masking

    Full text link
    With the advancements in deep learning approaches, the performance of speech enhancing systems in the presence of background noise have shown significant improvements. However, improving the system's robustness against reverberation is still a work in progress, as reverberation tends to cause loss of formant structure due to smearing effects in time and frequency. A wide range of deep learning-based systems either enhance the magnitude response and reuse the distorted phase or enhance complex spectrogram using a complex time-frequency mask. Though these approaches have demonstrated satisfactory performance, they do not directly address the lost formant structure caused by reverberation. We believe that retrieving the formant structure can help improve the efficiency of existing systems. In this study, we propose SkipConvGAN - an extension of our prior work SkipConvNet. The proposed system's generator network tries to estimate an efficient complex time-frequency mask, while the discriminator network aids in driving the generator to restore the lost formant structure. We evaluate the performance of our proposed system on simulated and real recordings of reverberant speech from the single-channel task of the REVERB challenge corpus. The proposed system shows a consistent improvement across multiple room configurations over other deep learning-based generative adversarial frameworks.Comment: Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 30

    Single-Microphone Speech Enhancement and Separation Using Deep Learning

    Get PDF
    • …
    corecore