5,295 research outputs found

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion

    A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices

    Get PDF
    In this thesis, a new robust noise spectral estimation algorithm is proposed for the purpose of single-microphone speech enhancement. This algorithm can generate the optimal noise spectral estimates in the Minimum Mean Square Error (MMSE) sense based on the speech statistics in the noisy environments. Compared to the well-adopted conventional noise spectral estimation method using the single-pole recursion, our proposed scheme is more reliable since the recursion coefficients are adaptable and optimal in the MMSE therein. We also propose a new accurate Resulting Signal-to-Noise Ratio (R-SNR) estimator as a quality measure to benchmark the existing noise spectral estimation techniques. This new R-SNR estimator can be applied to quantify not only the residual noise but also the speech distortion and therefore it can well serve as the overall speech quality measure after the noise suppression. We conduct the experiments to evaluate the performance of the noise suppression using our robust noise spectral estimation algorithm and compare it with those of two major existing noise spectral estimation methods. Through numerous simulations, we have shown that our noise suppression technique significantly outperforms the conventional methods in both stationary and nonstationary noise environments

    User-Symbiotic Speech Enhancement for Hearing Aids

    Get PDF

    A New Voice Controlled Noise Cancellation Approach

    Get PDF
    This paper presents a new approach to control the operation of adaptive noise cancellers (ANCs). The technique is based on using the residual output from the noise canceller to control the decision made by a voice activity detector (VAD). Threshold of full band energy feature is adjusted according to the residual output of the noise canceller. In variable background noise environment, the threshold controlled VAD prohibits the reference input from containing some components of actual speech signal during adaptation periods. The convergence behavior of the adaptive filter is greatly improved, since the reference input will be highly correlated with the primary input. In addition, the computation power will be reduced since the output of the adaptive filter will be calculated only during non- speech periods. The threshold controlled noise canceller achieves a cleaner output in about 50% of the time required by a non-controlled noise canceller

    A study on different linear and non-linear filtering techniques of speech and speech recognition

    Get PDF
    In any signal noise is an undesired quantity, however most of thetime every signal get mixed with noise at different levels of theirprocessing and application, due to which the information containedby the signal gets distorted and makes the whole signal redundant.A speech signal is very prominent with acoustical noises like bubblenoise, car noise, street noise etc. So for removing the noises researchershave developed various techniques which are called filtering. Basicallyall the filtering techniques are not suitable for every application,hence based on the type of application some techniques are betterthan the others. Broadly, the filtering techniques can be classifiedinto two categories i.e. linear filtering and non-linear filtering.In this paper a study is presented on some of the filtering techniqueswhich are based on linear and nonlinear approaches. These techniquesincludes different adaptive filtering based on algorithm like LMS,NLMS and RLS etc., Kalman filter, ARMA and NARMA time series applicationfor filtering, neural networks combine with fuzzy i.e. ANFIS. Thispaper also includes the application of various features i.e. MFCC,LPC, PLP and gamma for filtering and recognition

    Speech Enhancement for Automatic Analysis of Child-Centered Audio Recordings

    Get PDF
    Analysis of child-centred daylong naturalist audio recordings has become a de-facto research protocol in the scientific study of child language development. The researchers are increasingly using these recordings to understand linguistic environment a child encounters in her routine interactions with the world. These audio recordings are captured by a microphone that a child wears throughout a day. The audio recordings, being naturalistic, contain a lot of unwanted sounds from everyday life which degrades the performance of speech analysis tasks. The purpose of this thesis is to investigate the utility of speech enhancement (SE) algorithms in the automatic analysis of such recordings. To this effect, several classical signal processing and modern machine learning-based SE methods were employed 1) as a denoiser for speech corrupted with additive noise sampled from real-life child-centred daylong recordings and 2) as front-end for downstream speech processing tasks of addressee classification (infant vs. adult-directed speech) and automatic syllable count estimation from the speech. The downstream tasks were conducted on data derived from a set of geographically, culturally, and linguistically diverse child-centred daylong audio recordings. The performance of denoising was evaluated through objective quality metrics (spectral distortion and instrumental intelligibility) and through the downstream task performance. Finally, the objective evaluation results were compared with downstream task performance results to find whether objective metrics can be used as a reasonable proxy to select SE front-end for a downstream task. The results obtained show that a recently proposed Long Short-Term Memory (LSTM)-based progressive learning architecture provides maximum performance gains in the downstream tasks in comparison with the other SE methods and baseline results. Classical signal processing-based SE methods also lead to competitive performance. From the comparison of objective assessment and downstream task performance results, no predictive relationship between task-independent objective metrics and performance of downstream tasks was found

    A Study into Speech Enhancement Techniques in Adverse Environment

    Get PDF
    This dissertation developed speech enhancement techniques that improve the speech quality in applications such as mobile communications, teleconferencing and smart loudspeakers. For these applications it is necessary to suppress noise and reverberation. Thus the contribution in this dissertation is twofold: single channel speech enhancement system which exploits the temporal and spectral diversity of the received microphone signal for noise suppression and multi-channel speech enhancement method with the ability to employ spatial diversity to reduce reverberation
    • …
    corecore