2,087 research outputs found

    Third-order cumulant-based wiener filtering algorithm applied to robust speech recognition

    Get PDF
    In previous works [5], [6], we studied some speech enhancement algorithms based on the iterative Wiener filtering method due to Lim-Oppenheim [2], where the AR spectral estimation of the speech is carried out using a second-order analysis. But in our algorithms we consider an AR estimation by means of cumulant analysis. This work extends some preceding papers due to the authors: a cumulant-based Wiener Filtering (AR3_IF) is applied to Robust Speech Recognition. A low complexity approach of this algorithm is tested in presence of bathroom water noise and its performance is compared to classical Spectral Subtraction method. Some results are presented when training task of the speech recognition system (HTK-MFCC) is executed under clean and noisy conditions. These results show a lower sensitivity to the presence of water noise when applying AR3_IF algorithm inside of a speech recognition task.Peer ReviewedPostprint (published version

    Detection of Mines in Acoustic Images using Higher Order Spectral Features

    Get PDF
    A new pattern-recognition algorithm detects approximately 90% of the mines hidden in the Coastal Systems Station Sonar0, 1, and 3 databases of cluttered acoustic images, with about 10% false alarms. Similar to other approaches, the algorithm presented here includes processing the images with an adaptive Wiener filter (the degree of smoothing depends on the signal strength in a local neighborhood) to remove noise without destroying the structural information in the mine shapes, followed by a two-dimensional FIR filter designed to suppress noise and clutter, while enhancing the target signature. A double peak pattern is produced as the FIR filter passes over mine highlight and shadow regions. Although the location, size, and orientation of this pattern within a region of the image can vary, features derived from higher order spectra (HOS) are invariant to translation, rotation, and scaling, while capturing the spatial correlations of mine-like objects. Classification accuracy is improved by combining features based on geometrical properties of the filter output with features based on HOS. The highest accuracy is obtained by fusing classification based on bispectral features with classification based on trispectral features

    <strong>Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation</strong>

    Get PDF

    System approach to robust acoustic echo cancellation through semi-blind source separation based on independent component analysis

    Get PDF
    We live in a dynamic world full of noises and interferences. The conventional acoustic echo cancellation (AEC) framework based on the least mean square (LMS) algorithm by itself lacks the ability to handle many secondary signals that interfere with the adaptive filtering process, e.g., local speech and background noise. In this dissertation, we build a foundation for what we refer to as the system approach to signal enhancement as we focus on the AEC problem. We first propose the residual echo enhancement (REE) technique that utilizes the error recovery nonlinearity (ERN) to "enhances" the filter estimation error prior to the filter adaptation. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. SBSS optimized via independent component analysis (ICA) leads to the system combination of the LMS algorithm with the ERN that allows for continuous and stable adaptation even during double talk. Second, we extend the system perspective to the decorrelation problem for AEC, where we show that the REE procedure can be applied effectively in a multi-channel AEC (MCAEC) setting to indirectly assist the recovery of lost AEC performance due to inter-channel correlation, known generally as the "non-uniqueness" problem. We develop a novel, computationally efficient technique of frequency-domain resampling (FDR) that effectively alleviates the non-uniqueness problem directly while introducing minimal distortion to signal quality and statistics. We also apply the system approach to the multi-delay filter (MDF) that suffers from the inter-block correlation problem. Finally, we generalize the MCAEC problem in the SBSS framework and discuss many issues related to the implementation of an SBSS system. We propose a constrained batch-online implementation of SBSS that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. The proposed techniques are developed from a pragmatic standpoint, motivated by real-world problems in acoustic and audio signal processing. Generalization of the orthogonality principle to the system level of an AEC problem allows us to relate AEC to source separation that seeks to maximize the independence, hence implicitly the orthogonality, not only between the error signal and the far-end signal, but rather, among all signals involved. The system approach, for which the REE paradigm is just one realization, enables the encompassing of many traditional signal enhancement techniques in analytically consistent yet practically effective manner for solving the enhancement problem in a very noisy and disruptive acoustic mixing environment.PhDCommittee Chair: Biing-Hwang Juang; Committee Member: Brani Vidakovic; Committee Member: David V. Anderson; Committee Member: Jeff S. Shamma; Committee Member: Xiaoli M

    Design of automatic speech recognition in noisy environments enhancement and modification

    Get PDF
    Recurrent neural networks (RNN) and feed-forward multi-layer perceptron’s have been proposed for determining the absence and presence of speech in continuous voice signals when there is a variety of background noise levels present. The Aurora2 and Aurora3 were used to conduct detailed performance evaluations on vocal activity detection. When a Recurrent neural network feeds on automatic speech recognition particular features and acoustic features, the best outcomes can be achieved, according to this study. Aurora2 and the French, Romanian and Norway portions of the Aurora3 corpus are also proposed for detailed studies of ASR. When noise presence probability is utilized to change for encoding speech, phone subsequent probabilities are employed; the WER is reduced by 10.3 percent

    Robust variational Bayesian clustering for underdetermined speech separation

    Get PDF
    The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in noise and reverberant free environments but suffer in realistic environments. Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues. Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality. A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is then applied to the MESSL algorithm to cluster interaural phase differences thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance. Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective and subjective performance measures is achieved
    • …
    corecore