983 research outputs found

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion

    IMPACT OF MICROPHONE POSITIONAL ERRORS ON SPEECH INTELLIGIBILITY

    Get PDF
    The speech of a person speaking in a noisy environment can be enhanced through electronic beamforming using spatially distributed microphones. As this approach demands precise information about the microphone locations, its application is limited in places where microphones must be placed quickly or changed on a regular basis. Highly precise calibration or measurement process can be tedious and time consuming. In order to understand tolerable limits on the calibration process, the impact of microphone position error on the intelligibility is examined. Analytical expressions are derived by modeling the microphone position errors as a zero mean uniform distribution. Experiments and simulations were performed to show relationships between precision of the microphone location measurement and loss in intelligibility. A variety of microphone array configurations and distracting sources (other interfering speech and white noise) are considered. For speech near the threshold of intelligibility, the results show that microphone position errors with standard deviations less than 1.5cm can limit losses in intelligibility to within 10% of the maximum (perfect microphone placement) for all the microphone distributions examined. Of different array distributions experimented, the linear array tends to be more vulnerable whereas the non-uniform 3D array showed a robust performance to positional errors

    UniX-Encoder: A Universal XX-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

    Full text link
    The speech field is evolving to solve more challenging scenarios, such as multi-channel recordings with multiple simultaneous talkers. Given the many types of microphone setups out there, we present the UniX-Encoder. It's a universal encoder designed for multiple tasks, and worked with any microphone array, in both solo and multi-talker environments. Our research enhances previous multi-channel speech processing efforts in four key areas: 1) Adaptability: Contrasting traditional models constrained to certain microphone array configurations, our encoder is universally compatible. 2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition. 3) Self-Supervised Training: The encoder is trained without requiring labeled multi-channel data. 4) End-to-End Integration: In contrast to models that first beamform then process single-channels, our encoder offers an end-to-end solution, bypassing explicit beamforming or separation. To validate its effectiveness, we tested the UniX-Encoder on a synthetic multi-channel dataset from the LibriSpeech corpus. Across tasks like speech recognition and speaker diarization, our encoder consistently outperformed combinations like the WavLM model with the BeamformIt frontend.Comment: Submitted to ICASSP 202

    The ACE Challenge - corpus description and performance evaluation

    No full text
    • …
    corecore