1,301 research outputs found
Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition
Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios
Noise invariant frame selection: a simple method to address the background noise problem for text-independent speaker verification
The performance of speaker-related systems usually degrades heavily in practical applications largely due to the background noise. To improve the robustness of such systems in unknown noisy environments, this paper proposes a simple pre-processing method called Noise Invariant Frame Selection (NIFS). Based on several noisy constraints, it selects noise invariant frames from utterances to represent speakers. Experiments conducted on the TIMIT database showed that the NIFS can significantly improve the performance of Vector Quantization (VQ), Gaussian Mixture Model-Universal Background Model (GMM-UBM) and i-vector-based speaker verification systems in different unknown noisy environments with different SNRs, in comparison to their baselines. Meanwhile, the proposed NIFS-based speaker systems has achieves similar performance when we change the constraints (hyper-parameters) or features, which indicates that it is easy to reproduce. Since NIFS is designed as a general algorithm, it could be further applied to other similar tasks
Analysis of and techniques for adaptive equalization for underwater acoustic communication
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2011Underwater wireless communication is quickly becoming a necessity for applications
in ocean science, defense, and homeland security. Acoustics remains the only practical
means of accomplishing long-range communication in the ocean. The acoustic
communication channel is fraught with difficulties including limited available bandwidth,
long delay-spread, time-variability, and Doppler spreading. These difficulties
reduce the reliability of the communication system and make high data-rate communication
challenging. Adaptive decision feedback equalization is a common method to
compensate for distortions introduced by the underwater acoustic channel. Limited
work has been done thus far to introduce the physics of the underwater channel into
improving and better understanding the operation of a decision feedback equalizer.
This thesis examines how to use physical models to improve the reliability and reduce
the computational complexity of the decision feedback equalizer. The specific topics
covered by this work are: how to handle channel estimation errors for the time varying
channel, how to use angular constraints imposed by the environment into an array
receiver, what happens when there is a mismatch between the true channel order and
the estimated channel order, and why there is a performance difference between the
direct adaptation and channel estimation based methods for computing the equalizer
coefficients. For each of these topics, algorithms are provided that help create a more
robust equalizer with lower computational complexity for the underwater channel.This work would not have been possible without support from the O ce of Naval
Research, through a Special Research Award in Acoustics Graduate Fellowship (ONR
Grant #N00014-09-1-0540), with additional support from ONR Grant #N00014-05-
10085 and ONR Grant #N00014-07-10184
Increasing the robustness of autonomous systems to hardware degradation using machine learning
Autonomous systems perform predetermined tasks (missions) with minimum supervision. In most applications, the state of the world changes with time. Sensors are employed to measure part or whole of the world’s state. However, sensors often fail amidst operation; feeding as such decision-making with wrong information about the world. Moreover, hardware degradation may alter dynamic behaviour, and subsequently the capabilities, of an autonomous system; rendering the original mission infeasible. This thesis applies machine learning to yield powerful and robust tools that can facilitate autonomy in modern systems. Incremental kernel regression is used for dynamic modelling. Algorithms of this sort are easy to train and are highly adaptive. Adaptivity allows for model adjustments, whenever the environment of operation changes. Bayesian reasoning provides a rigorous framework for addressing uncertainty. Moreover, using Bayesian Networks, complex inference regarding hardware degradation can be answered. Specifically, adaptive modelling is combined with Bayesian reasoning to yield recursive estimation algorithms that are robust to sensor failures. Two solutions are presented by extending existing recursive estimation algorithms from the robotics literature. The algorithms are deployed on an underwater vehicle and the performance is assessed in real-world experiments. A comparison against standard filters is also provided. Next, the previous algorithms are extended to consider sensor and actuator failures jointly. An algorithm that can detect thruster failures in an Autonomous Underwater Vehicle has been developed. Moreover, the algorithm adapts the dynamic model online to compensate for the detected fault. The performance of this algorithm was also tested in a real-world application. One step further than hardware fault detection, prognostics predict how much longer can a particular hardware component operate normally. Ubiquitous sensors in modern systems render data-driven prognostics a viable solution. However, training is based on skewed datasets; datasets where the samples from the faulty region of operation are much fewer than the ones from the healthy region of operation. This thesis presents a prognostic algorithm that tackles the problem of imbalanced (skewed) datasets
Subspace Gaussian mixture models for automatic speech recognition
In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs)
are used to model the density of the emitting states in the hidden Markov models
(HMMs). In a conventional system, the model parameters of each GMM are estimated
directly and independently given the alignment. This results a large number of
model parameters to be estimated, and consequently, a large amount of training data
is required to fit the model. In addition, different sources of acoustic variability that
impact the accuracy of a recogniser such as pronunciation variation, accent, speaker
factor and environmental noise are only weakly modelled and factorized by adaptation
techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori
adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis,
we will discuss an alternative acoustic modelling approach — the subspace Gaussian
mixture model (SGMM), which is expected to deal with these two issues better. In an
SGMM, the model parameters are derived from low-dimensional model and speaker
subspaces that can capture phonetic and speaker correlations. Given these subspaces,
only a small number of state-dependent parameters are required to derive the corresponding
GMMs. Hence, the total number of model parameters can be reduced, which
allows acoustic modelling with a limited amount of training data. In addition, the
SGMM-based acoustic model factorizes the phonetic and speaker factors and within
this framework, other source of acoustic variability may also be explored.
In this thesis, we propose a regularised model estimation for SGMMs, which avoids
overtraining in case that the training data is sparse. We will also take advantage of
the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource
speech recognition. Here, the model subspace is estimated from out-domain data and
ported to the target language system. In this case, only the state-dependent parameters
need to be estimated which relaxes the requirement of the amount of training data. To
improve the robustness of SGMMs against environmental noise, we propose to apply
the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective.
We will report experimental results on the Wall Street Journal (WSJ) database
and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of
SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on
the Aurora 4 database
Teaching old sensors New tricks: archetypes of intelligence
In this paper a generic intelligent sensor software architecture is described which builds upon the basic requirements of related industry standards (IEEE 1451 and SEVA BS- 7986). It incorporates specific functionalities such as real-time fault detection, drift compensation, adaptation to environmental changes and autonomous reconfiguration. The modular based structure of the intelligent sensor architecture provides enhanced flexibility in regard to the choice of specific algorithmic realizations. In this context, the particular aspects of fault detection and drift estimation are discussed. A mixed indicative/corrective fault detection approach is proposed while it is demonstrated that reversible/irreversible state dependent drift can be estimated using generic algorithms such as the EKF or on-line density estimators. Finally, a parsimonious density estimator is presented and validated through simulated and real data for use in an operating regime dependent fault detection framework
Statistical models for noise-robust speech recognition
A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. This thesis looks into improving on this approximation in two ways: firstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution.
The first part of this work is about compensating for within-component feature correlations under noise. For this, the covariance matrices of the computed Gaussians should be full instead of diagonal. The estimation of off-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, like VTS compensation, use for dynamic coefficients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coefficients, and dynamic coefficients, which represent signal changes over time, and are normally computed from a window of static coefficients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It first compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coefficients. It introduces a number of methods that address the correlation changes that occur in noise within this framework. The next problem is decoding speed with full covariances. This thesis re-analyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost.
The second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to use for recognition, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. The KL divergence proves to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make.This work was supported by Toshiba Research Europe Ltd., Cambridge Research Laboratory
Six Noise Type Military Sound Classifier
Blast noise from military installations often has a negative impact on the quality of life of residents living in nearby communities. This negatively impacts the military's testing \& training capabilities due to restrictions, curfews, or range closures enacted to address noise complaints. In order to more directly manage noise around military installations, accurate noise monitoring has become a necessity. Although most noise monitors are simple sound level meters, more recent ones are capable of discerning blasts from ambient noise with some success. Investigators at the University of Pittsburgh previously developed a more advanced noise classifier that can discern between wind, aircraft, and blast noise, while simultaneously lowering the measurement threshold. Recent work will be presented from the development of a more advanced classifier that identifies additional classes of noise such as machine gun fire, vehicles, and thunder. Additional signal metrics were explored given the increased complexity of the classifier. By broadening the types of noise the system can accurately classify and increasing the number of metrics, a new system was developed with increased blast noise accuracy, decreased number of missed events, and significantly fewer false positives
- …