1,301 research outputs found

    Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition

    Get PDF
    Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios

    Noise invariant frame selection: a simple method to address the background noise problem for text-independent speaker verification

    Get PDF
    The performance of speaker-related systems usually degrades heavily in practical applications largely due to the background noise. To improve the robustness of such systems in unknown noisy environments, this paper proposes a simple pre-processing method called Noise Invariant Frame Selection (NIFS). Based on several noisy constraints, it selects noise invariant frames from utterances to represent speakers. Experiments conducted on the TIMIT database showed that the NIFS can significantly improve the performance of Vector Quantization (VQ), Gaussian Mixture Model-Universal Background Model (GMM-UBM) and i-vector-based speaker verification systems in different unknown noisy environments with different SNRs, in comparison to their baselines. Meanwhile, the proposed NIFS-based speaker systems has achieves similar performance when we change the constraints (hyper-parameters) or features, which indicates that it is easy to reproduce. Since NIFS is designed as a general algorithm, it could be further applied to other similar tasks

    Analysis of and techniques for adaptive equalization for underwater acoustic communication

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2011Underwater wireless communication is quickly becoming a necessity for applications in ocean science, defense, and homeland security. Acoustics remains the only practical means of accomplishing long-range communication in the ocean. The acoustic communication channel is fraught with difficulties including limited available bandwidth, long delay-spread, time-variability, and Doppler spreading. These difficulties reduce the reliability of the communication system and make high data-rate communication challenging. Adaptive decision feedback equalization is a common method to compensate for distortions introduced by the underwater acoustic channel. Limited work has been done thus far to introduce the physics of the underwater channel into improving and better understanding the operation of a decision feedback equalizer. This thesis examines how to use physical models to improve the reliability and reduce the computational complexity of the decision feedback equalizer. The specific topics covered by this work are: how to handle channel estimation errors for the time varying channel, how to use angular constraints imposed by the environment into an array receiver, what happens when there is a mismatch between the true channel order and the estimated channel order, and why there is a performance difference between the direct adaptation and channel estimation based methods for computing the equalizer coefficients. For each of these topics, algorithms are provided that help create a more robust equalizer with lower computational complexity for the underwater channel.This work would not have been possible without support from the O ce of Naval Research, through a Special Research Award in Acoustics Graduate Fellowship (ONR Grant #N00014-09-1-0540), with additional support from ONR Grant #N00014-05- 10085 and ONR Grant #N00014-07-10184

    Increasing the robustness of autonomous systems to hardware degradation using machine learning

    Get PDF
    Autonomous systems perform predetermined tasks (missions) with minimum supervision. In most applications, the state of the world changes with time. Sensors are employed to measure part or whole of the world’s state. However, sensors often fail amidst operation; feeding as such decision-making with wrong information about the world. Moreover, hardware degradation may alter dynamic behaviour, and subsequently the capabilities, of an autonomous system; rendering the original mission infeasible. This thesis applies machine learning to yield powerful and robust tools that can facilitate autonomy in modern systems. Incremental kernel regression is used for dynamic modelling. Algorithms of this sort are easy to train and are highly adaptive. Adaptivity allows for model adjustments, whenever the environment of operation changes. Bayesian reasoning provides a rigorous framework for addressing uncertainty. Moreover, using Bayesian Networks, complex inference regarding hardware degradation can be answered. Specifically, adaptive modelling is combined with Bayesian reasoning to yield recursive estimation algorithms that are robust to sensor failures. Two solutions are presented by extending existing recursive estimation algorithms from the robotics literature. The algorithms are deployed on an underwater vehicle and the performance is assessed in real-world experiments. A comparison against standard filters is also provided. Next, the previous algorithms are extended to consider sensor and actuator failures jointly. An algorithm that can detect thruster failures in an Autonomous Underwater Vehicle has been developed. Moreover, the algorithm adapts the dynamic model online to compensate for the detected fault. The performance of this algorithm was also tested in a real-world application. One step further than hardware fault detection, prognostics predict how much longer can a particular hardware component operate normally. Ubiquitous sensors in modern systems render data-driven prognostics a viable solution. However, training is based on skewed datasets; datasets where the samples from the faulty region of operation are much fewer than the ones from the healthy region of operation. This thesis presents a prognostic algorithm that tackles the problem of imbalanced (skewed) datasets

    Subspace Gaussian mixture models for automatic speech recognition

    Get PDF
    In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach — the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored. In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database

    Teaching old sensors New tricks: archetypes of intelligence

    No full text
    In this paper a generic intelligent sensor software architecture is described which builds upon the basic requirements of related industry standards (IEEE 1451 and SEVA BS- 7986). It incorporates specific functionalities such as real-time fault detection, drift compensation, adaptation to environmental changes and autonomous reconfiguration. The modular based structure of the intelligent sensor architecture provides enhanced flexibility in regard to the choice of specific algorithmic realizations. In this context, the particular aspects of fault detection and drift estimation are discussed. A mixed indicative/corrective fault detection approach is proposed while it is demonstrated that reversible/irreversible state dependent drift can be estimated using generic algorithms such as the EKF or on-line density estimators. Finally, a parsimonious density estimator is presented and validated through simulated and real data for use in an operating regime dependent fault detection framework

    Statistical models for noise-robust speech recognition

    Get PDF
    A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. This thesis looks into improving on this approximation in two ways: firstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution. The first part of this work is about compensating for within-component feature correlations under noise. For this, the covariance matrices of the computed Gaussians should be full instead of diagonal. The estimation of off-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, like VTS compensation, use for dynamic coefficients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coefficients, and dynamic coefficients, which represent signal changes over time, and are normally computed from a window of static coefficients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It first compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coefficients. It introduces a number of methods that address the correlation changes that occur in noise within this framework. The next problem is decoding speed with full covariances. This thesis re-analyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost. The second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to use for recognition, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. The KL divergence proves to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make.This work was supported by Toshiba Research Europe Ltd., Cambridge Research Laboratory

    Temporally Varying Weight Regression for Speech Recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Six Noise Type Military Sound Classifier

    Get PDF
    Blast noise from military installations often has a negative impact on the quality of life of residents living in nearby communities. This negatively impacts the military's testing \& training capabilities due to restrictions, curfews, or range closures enacted to address noise complaints. In order to more directly manage noise around military installations, accurate noise monitoring has become a necessity. Although most noise monitors are simple sound level meters, more recent ones are capable of discerning blasts from ambient noise with some success. Investigators at the University of Pittsburgh previously developed a more advanced noise classifier that can discern between wind, aircraft, and blast noise, while simultaneously lowering the measurement threshold. Recent work will be presented from the development of a more advanced classifier that identifies additional classes of noise such as machine gun fire, vehicles, and thunder. Additional signal metrics were explored given the increased complexity of the classifier. By broadening the types of noise the system can accurately classify and increasing the number of metrics, a new system was developed with increased blast noise accuracy, decreased number of missed events, and significantly fewer false positives
    corecore