11 research outputs found

    Evaluation of PNCC and extended spectral subtraction methods for robust speech recognition

    Get PDF
    International audienceThis paper evaluates the robustness of different approaches for speech recognition with respect to signal-to-noise ratio (SNR), to signal level and to presence of non-speech data before and after utterances to be recognized. Three types of noise robust features are considered: Power Normalized Cepstral Coefficients (PNCC), Mel-Frequency Cepstral Coefficients (MFCC) after applying an extended spectral subtraction method, and Sphinx embedded denoising features from recent sphinx versions. Although removing C0 in MFCC-based features leads to a slight decrease in speech recognition performance, it makes the speech recognition system independent on the speech signal level. With multi-condition training, the three sets of noise-robust features lead to a rather similar behavior of performance with respect to SNR and presence of non-speech data. Overall, best performance is achieved with the extended spectral subtraction approach. Also, the performance of the PNCC features appears to be dependent on the initialization of the normalization factor

    ASR Systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness

    Get PDF
    This paper deals with the analysis of Automatic Speech Recognition (ASR) suitable for usage within noisy environment and suggests optimum configuration under various noisy conditions. The behavior of standard parameterization techniques was analyzed from the viewpoint of robustness against background noise. It was done for Melfrequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP) coefficients, and their modified forms combining main blocks of PLP and MFCC. The second part is devoted to the analysis and contribution of modified techniques containing frequency-domain noise suppression and voice activity detection. The above-mentioned techniques were tested with signals in real noisy environment within Czech digit recognition task and AURORA databases. Finally, the contribution of special VAD selective training and MLLR adaptation of acoustic models were studied for various signal features

    Multi-Condition Training for Unknown Environment Adaptation in Robust ASR Under Real Conditions

    Get PDF
    Automatic speech recognition (ASR) systems frequently work in a noisy environment. As they are often trained on clean speech data, noise reduction or adaptation techniques are applied to decrease the influence of background disturbance even in the case of unknown conditions. Speech data mixed with noise recordings from particular environment are often used for the purposes of model adaptation. This paper analyses the improvement of recognition performance within such adaptation when multi-condition training data from a real environment is used for training initial models. Although the quality of such models can decrease with the presence of noise in the training material, they are assumed to include initial information about noise and consequently support the adaptation procedure. Experimental results show significant improvement of the proposed training method in a robust ASR task under unknown noisy conditions. The decrease by 29 % and 14 % in word error rate in comparison with clean speech training data was achieved for the non-adapted and adapted system, respectively.

    Speech Enhancement Strategy for Speech Recognition Microcontroller under Noisy Environments

    Get PDF
    Industrial automation with speech control functions is generally installed with a speech recognition sensor which is used as an interface for users to articulate speech commands. However, recognition errors are likely to be produced when background noise surrounds the command spoken into the speech recognition microcontrollers. In this paper, a speech enhancement strategy is proposed to develop noise suppression filters in order to improve the accuracy of speech recognition microcontrollers. It uses a universal estimator, namely a neural network, to enhance the recognition accuracy of microcontrollers by integrating better signals processed by various noise suppression filters, where a global optimization algorithm, namely an intelligent particle swarm optimization, is used to optimize the inbuilt parameters of the neural network in order to maximize accuracy of speech recognition microcontrollers working within noisy environments. The proposed approach overcomes the limitations of the existing noise suppression filters intended to improve recognition accuracy. The performance of the proposed approach was evaluated by a speech recognition microcontroller, which is used in electronic products with speech control functions. Results show that the accuracy of the speech recognition microcontroller can be improved using the proposed approach, when working under low signal to noise ratio conditions in the industrial environments of automobile engines and factory machines

    Composite Subband Adaptive Speech Enhancement

    Get PDF
    Disertační práce se zabývá jednokanálovými a dvoukanálovými algoritmy pro zvýraznění řeči. Cílem práce je provést důkladnou analýzu těchto algoritmů a na základě toho navrhnout kombinovanou kompozitní metodu pro zvýraznění řeči pro použití v prostředí vojenských vozidel. Cílem je nalezení takového řešení, které přinese kvalitativní zlepšení řeči zejména ve vojenských komunikačních systémech. Ty jsou charakterizovány velmi vysokým hlukem, nízkým poměrem SNR a hlukem vyskytujícím se především na nízkých kmitočtech.The thesis deals with single channel and multiple channel algorithms for speech enhancement. The goal of this work is to perform the deep analysis of both single channel and multiple channel algorithms in sense of their behaviour in noisy environment of combat vehicles and platforms. Based on this analysis a new composite speech enhancement algorithm will be designed. This new approach is expected to increase quality of the processed speech in military communications systems. These systems are characterised by their operation under very noisy conditions where background noise is very high and signal-to-noise ratio extremely low. These noisy conditions are typical for the range of military and combat platforms and vehicles.

    New Approaches for Speech Enhancement in the Short-Time Fourier Transform Domain

    Get PDF
    Speech enhancement aims at the improvement of speech quality by using various algorithms. A speech enhancement technique can be implemented as either a time domain or a transform domain method. In the transform domain speech enhancement, the spectrum of clean speech signal is estimated through the modification of noisy speech spectrum and then it is used to obtain the enhanced speech signal in the time domain. Among the existing transform domain methods in the literature, the short-time Fourier transform (STFT) processing has particularly served as the basis to implement most of the frequency domain methods. In general, speech enhancement methods in the STFT domain can be categorized into the estimators of complex discrete Fourier transform (DFT) coefficients and the estimators of real-valued short-time spectral amplitude (STSA). Due to the computational efficiency of the STSA estimation method and also its superior performance in most cases, as compared to the estimators of complex DFT coefficients, we focus mostly on the estimation of speech STSA throughout this work and aim at developing algorithms for noise reduction and reverberation suppression. First, we tackle the problem of additive noise reduction using the single-channel Bayesian STSA estimation method. In this respect, we present new schemes for the selection of Bayesian cost function parameters for a parametric STSA estimator, namely the W�-SA estimator, based on an initial estimate of the speech and also the properties of human auditory system. We further use the latter information to design an efficient flooring scheme for the gain function of the STSA estimator. Next, we apply the generalized Gaussian distribution (GGD) to theW�-SA estimator as the speech STSA prior and propose to choose its parameters according to noise spectral variance and a priori signal to noise ratio (SNR). The suggested STSA estimation schemes are able to provide further noise reduction as well as less speech distortion, as compared to the previous methods. Quality and noise reduction performance evaluations indicated the superiority of the proposed speech STSA estimation with respect to the previous estimators. Regarding the multi-channel counterpart of the STSA estimation method, first we generalize the proposed single-channel W�-SA estimator to the multi-channel case for spatially uncorrelated noise. It is shown that under the Bayesian framework, a straightforward extension from the single-channel to the multi-channel case can be performed by generalizing the STSA estimator parameters, i.e. � and �. Next, we develop Bayesian STSA estimators by taking advantage of speech spectral phase rather than only relying on the spectral amplitude of observations, in contrast to conventional methods. This contribution is presented for the multi-channel scenario with single-channel as a special case. Next, we aim at developing multi-channel STSA estimation under spatially correlated noise and derive a generic structure for the extension of a single-channel estimator to its multi-channel counterpart. It is shown that the derived multi-channel extension requires a proper estimate of the spatial correlation matrix of noise. Subsequently, we focus on the estimation of noise correlation matrix, that is not only important in the multi-channel STSA estimation scheme but also highly useful in different beamforming methods. Next, we aim at speech reverberation suppression in the STFT domain using the weighted prediction error (WPE) method. The original WPE method requires an estimate of the desired speech spectral variance along with reverberation prediction weights, leading to a sub-optimal strategy that alternatively estimates each of these two quantities. Also, similar to most other STFT based speech enhancement methods, the desired speech coefficients are assumed to be temporally independent, while this assumption is inaccurate. Taking these into account, first, we employ a suitable estimator for the speech spectral variance and integrate it into the estimation of the reverberation prediction weights. In addition to the performance advantage with respect to the previous versions of the WPE method, the presented approach provides a good reduction in implementation complexity. Next, we take into account the temporal correlation present in the STFT of the desired speech, namely the inter-frame correlation (IFC), and consider an approximate model where only the frames within each segment of speech are considered as correlated. Furthermore, an efficient method for the estimation of the underlying IFC matrix is developed based on the extension of the speech variance estimator proposed previously. The performance results reveal lower residual reverberation and higher overall quality provided by the proposed method. Finally, we focus on the problem of late reverberation suppression using the classic speech spectral enhancement method originally developed for additive noise reduction. As our main contribution, we propose a novel late reverberant spectral variance (LRSV) estimator which replaces the noise spectral variance in order to modify the gain function for reverberation suppression. The suggested approach employs a modified version of the WPE method in a model based smoothing scheme used for the estimation of the LRSV. According to the experiments, the proposed LRSV estimator outperforms the previous major methods considerably and scores the closest results to the theoretically true LRSV estimator. Particularly, in case of changing room impulse responses (RIRs) where other methods cannot follow the true LRSV estimator accurately, the suggested estimator is able to track true LRSV values and results in a smaller tracking error. We also target a few other aspects of the spectral enhancement method for reverberation suppression, which were explored before only for the purpose of noise reduction. These contributions include the estimation of signal to reverberant ratio (SRR) and the development of new schemes for the speech presence probability (SPP) and spectral gain flooring in the context of late reverberation suppression

    Elastic image registration using parametric deformation models

    Get PDF
    The main topic of this thesis is elastic image registration for biomedical applications. We start with an overview and classification of existing registration techniques. We revisit the landmark interpolation which appears in the landmark-based registration techniques and add some generalizations. We develop a general elastic image registration algorithm. It uses a grid of uniform B-splines to describe the deformation. It also uses B-splines for image interpolation. Multiresolution in both image and deformation model spaces yields robustness and speed. First we describe a version of this algorithm targeted at finding unidirectional deformation in EPI magnetic resonance images. Then we present the enhanced and generalized version of this algorithm which is significantly faster and capable of treating multidimensional deformations. We apply this algorithm to the registration of SPECT data and to the motion estimation in ultrasound image sequences. A semi-automatic version of the registration algorithm is capable of accepting expert hints in the form of soft landmark constraints. Much fewer landmarks are needed and the results are far superior compared to pure landmark registration. In the second part of this thesis, we deal with the problem of generalized sampling and variational reconstruction. We explain how to reconstruct an object starting from several measurements using arbitrary linear operators. This comprises the case of traditional as well as generalized sampling. Among all possible reconstructions, we choose the one minimizing an a priori given quadratic variational criterion. We give an overview of the method and present several examples of applications. We also provide the mathematical details of the theory and discuss the choice of the variational criterion to be used

    Extended Spectral Subtraction

    No full text
    This paper describes a new method for one channel noise suppression system which overcomes the typical disadvantage of one channel noise suppression algorithms-the impossibility of noise estimation during speech sequence. Our method is the combination of Wiener ltering and spectral subtraction. The noise can be successfully updated even during the speech sequences and that is why there is no need of the voice activity detector.
    corecore