523 research outputs found
Predicción lineal de la parte causal de la autocorrelación para la identificación del locutor en ambientes ruidosos
Recently, a new parametrization technique based on the AR modelling of the one-sided autocorrelation sequence (OSALPC) has shown to be attractive for speech recognition because of its simplicity and its high recognition perfomance in noisy conditions. In this paper, that new parametrization technique is proposed to speaker identification in noisy enviroment. Experimental results obtained with a new speaker identification system based on the statistics of the cepstrals vectors show that OSALPC also achieves much better results than standard parametrization techniques.Peer ReviewedPostprint (published version
AR modeling of the speech autocorrelation to improve noisy speech recognition
Speech recognition in noisy environments remains an unsolved problem even in the case of isolated word recognition with small vocabularies. Recently, several techniques have been proposed to alleviate this problem. Concretely, two closely related parameterization techniques based on an AR modelling in the autocorrelation domain called SMC [1] and OSALPC [2] have shown good results using speech contaminated by additive white noise. The aim of this paper is twofold: to compare several techniques based on an AR modelling in the autocorrelation domain, including SMC and OSALPC, and to find the optimum model order and cepstral liftering for noisy conditions.Peer ReviewedPostprint (published version
Some fast higher order ar estimation techniques applied to parametric wiener filtering
Some Speech Enhancement algorithms based on the iterative Wiener filtering Method due to L1m-Oppenheim [2] are presented. In the original Lim-Oppenheim algorithm, speech AR estimation is carried out using classic second-order analysis, but our algorithms consider a more robust AR modelling. Two different strategies of speech AR estimation are presented and both estimators are trying to see as less amount of noise as possible. First one uses a previous One-Sided Autocorrelation computation, that is a pole-preserving function, and the actual SNR m the second-order LPC analysis is increased. Second one combines advantages of Higher-Order Statistics [1] with a linear combination of AR coefficients, belonging to two consecutive overlapped frames, to assess a less disturbed speech estimation.Peer ReviewedPostprint (published version
A robust feature extraction for automatic speech recognition in noisy environments
This paper presents a method for extraction of speech robust features when the external noise is additive and has white noise characteristics. The process consists of a short time power normalisation which goal is to preserve as much as possible, the speech features against noise. The proposed normalisation will be optimal if the corrupted process has, as the noise process white noise characteristics. With optimal normalisation we can mean that the corrupting noise does not change at all the means of the observed vectors of the corrupted process. As most of the speech energy is contained in a relatively small frequency band being most of the band composed by noise or noise-like power, this normalisation process can still capture most of the noise distortions.
For Signal to Noise Ratio greater than 5 dB the results show that for stationary white noise, the normalisation process where the noise characteristics are ignored at the test phase, outperforms the conventional Markov models composition where the noise is known. If the noise is known, a reasonable approximation of the inverted system can be easily obtained performing noise compensation still increasing the recogniser performance
New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition
This paper presents a novel noise-robust feature
extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual
MVDR spectrum of the filtered short-time autocorrelation
sequence can reduce the effects of residue of the nonstationary
additive noise which remains after filtering the autocorrelation.
To achieve a more robust front-end, we also modify the robust
distortionless constraint of the MVDR spectral estimation method
via revised weighting of the subband power spectrum values
based on the sub-band signal to noise ratios (SNRs), which adjusts
it to the new proposed approach. This new function allows the
components of the input signal at the frequencies least affected by
noise to pass with larger weights and attenuates more effectively
the noisy and undesired components. This modification results
in reduction of the noise residuals of the estimated spectrum
from the filtered autocorrelation sequence, thereby leading to
a more robust algorithm. Our proposed method, when evaluated
on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions
Investigation of the impact of high frequency transmitted speech on speaker recognition
Thesis (MScEng)--Stellenbosch University, 2002.Some digitised pages may appear illegible due to the condition of the original hard copy.ENGLISH ABSTRACT: Speaker recognition systems have evolved to a point where near perfect performance can be
obtained under ideal conditions, even if the system must distinguish between a large number
of speakers. Under adverse conditions, such as when high noise levels are present or when the
transmission channel deforms the speech, the performance is often less than satisfying.
This project investigated the performance of a popular speaker recognition system, that use
Gaussian mixture models, on speech transmitted over a high frequency channel. Initial experiments
demonstrated very unsatisfactory results for the base line system.
We investigated a number of robust techniques. We implemented and applied some of them in
an attempt to improve the performance of the speaker recognition systems. The techniques we
tested showed only slight improvements.
We also investigates the effects of a high frequency channel and single sideband modulation on
the speech features of speech processing systems. The effects that can deform the features, and
therefore reduce the performance of speech systems, were identified.
One of the effects that can greatly affect the performance of a speech processing system is
noise. We investigated some speech enhancement techniques and as a result we developed a
new statistical based speech enhancement technique that employs hidden Markov models to
represent the clean speech process.AFRIKAANSE OPSOMMING: Sprekerherkenning-stelsels het 'n punt bereik waar nabyaan perfekte resultate verwag kan word
onder ideale kondisies, selfs al moet die stelsel tussen 'n groot aantal sprekers onderskei. Wanneer
nie-ideale kondisies, soos byvoorbeeld hoë ruisvlakke of 'n transmissie kanaal wat die
spraak vervorm, teenwoordig is, is die resultate gewoonlik nie bevredigend nie.
Die projek ondersoek die werksverrigting van 'n gewilde sprekerherkenning-stelsel, wat gebruik
maak van Gaussiese mengselmodelle, op spraak wat oor 'n hoë frekwensie transmissie
kanaal gestuur is. Aanvanklike eksperimente wat gebruik maak van 'n basiese stelsel het nie
goeie resultate opgelewer nie.
Ons het 'n aantal robuuste tegnieke ondersoek en 'n paar van hulle geïmplementeer en getoets
in 'n poging om die resultate van die sprekerherkenning-stelsel te verbeter. Die tegnieke wat
ons getoets het, het net geringe verbetering getoon.
Die studie het ook die effekte wat die hoë-frekwensie kanaal en enkel-syband modulasie op
spraak kenmerkvektore, ondersoek. Die effekte wat die spraak kenmerkvektore kan vervorm en
dus die werkverrigting van spraak stelsels kan verlaag, is geïdentifiseer.
Een van die effekte wat 'n groot invloed op die werkverrigting van spraakstelsels het, is ruis.
Ons het spraak verbeterings metodes ondersoek en dit het gelei tot die ontwikkeling van 'n
statisties gebaseerde spraak verbeteringstegniek wat gebruik maak van verskuilde Markov modelle
om die skoon spraakproses voor te stel
Wavelet-based techniques for speech recognition
In this thesis, new wavelet-based techniques have been developed for the
extraction of features from speech signals for the purpose of automatic speech
recognition (ASR). One of the advantages of the wavelet transform over the short
time Fourier transform (STFT) is its capability to process non-stationary signals.
Since speech signals are not strictly stationary the wavelet transform is a better
choice for time-frequency transformation of these signals. In addition it has
compactly supported basis functions, thereby reducing the amount of
computation as opposed to STFT where an overlapping window is needed. [Continues.
A Method for Compressing Parameters in Bayesian Models with Application to Logistic Sequence Prediction Models
Bayesian classification and regression with high order interactions is
largely infeasible because Markov chain Monte Carlo (MCMC) would need to be
applied with a great many parameters, whose number increases rapidly with the
order. In this paper we show how to make it feasible by effectively reducing
the number of parameters, exploiting the fact that many interactions have the
same values for all training cases. Our method uses a single ``compressed''
parameter to represent the sum of all parameters associated with a set of
patterns that have the same value for all training cases. Using symmetric
stable distributions as the priors of the original parameters, we can easily
find the priors of these compressed parameters. We therefore need to deal only
with a much smaller number of compressed parameters when training the model
with MCMC. The number of compressed parameters may have converged before
considering the highest possible order. After training the model, we can split
these compressed parameters into the original ones as needed to make
predictions for test cases. We show in detail how to compress parameters for
logistic sequence prediction models. Experiments on both simulated and real
data demonstrate that a huge number of parameters can indeed be reduced by our
compression method.Comment: 29 page
Recommended from our members
Modelling and extraction of fundamental frequency in speech signals
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed
ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method
- …