472 research outputs found
A Reinvestigation of the Extended Kalman Filter applied to Formant Tracking
This paper examines the application of the Extended Kalman Filter to formant tracking. The derivation of the Jacobian matrix for the Extended Kalman filter procedure
is given. Additionally, it demonstrates how robustness can be incorporated to the procedure. Results are presented to illustrate the formant tracking ability of the nonrobust
and robust Extended Kalman filter algorithms
Formant paths tracking using Linear Prediction based methods
This paper focuses on formants as basic parameters for vowels recognition. There are used two different algorithms for formants finding based on the LP algorithm: spectral peak picking and root extraction algorithm - obtaining very good path estimations by each algorithm. Those methods are compared in a graphical form in our application ‘WaveBlaster’
Multirate Frequency Transformations: Wideband AM-FM Demodulation with Applications to Signal Processing and Communications
The AM-FM (amplitude & frequency modulation) signal model finds numerous applications in image processing, communications, and speech processing. The traditional approaches towards demodulation of signals in this category are the analytic signal approach, frequency tracking, or the energy operator approach. These approaches however, assume that the amplitude and frequency components are slowly time-varying, e.g., narrowband and incur significant demodulation error in the wideband scenarios. In this thesis, we extend a two-stage approach towards wideband AM-FM demodulation that combines multirate frequency transformations (MFT) enacted through a combination of multirate systems with traditional demodulation techniques, e.g., the Teager-Kasiser energy operator demodulation (ESA) approach to large wideband to narrowband conversion factors.
The MFT module comprises of multirate interpolation and heterodyning and converts the wideband AM-FM signal into a narrowband signal, while the demodulation module such as ESA demodulates the narrowband signal into constituent amplitude and frequency components that are then transformed back to yield estimates for the wideband signal.
This MFT-ESA approach is then applied to the various problems of: (a) wideband image demodulation and fingerprint demodulation, where multidimensional energy separation is employed, (b) wideband first-formant demodulation in vowels, and (c) wideband CPM demodulation with partial response signaling, to demonstrate its validity in both monocomponent and multicomponent scenarios as an effective multicomponent AM-FM signal demodulation and analysis technique for image processing, speech processing, and communications based applications
Power-Weighted LPC Formant Estimation
A power-weighted formant frequency estimation procedure based on Linear Predictive Coding (LPC) is presented. It works by pre-emphasizing the dominant spectral components of an input signal, which allows a subsequent estimation step to extract formant frequencies with greater accuracy. The accuracy of traditional LPC formant estimation is improved by this new power-weighted formant estimator for different classes of synthetic signals and for speech. Power-weighted LPC significantly and reliably outperforms LPC and variants of LPC at the task of formant estimation using the VTR formants dataset, a database consisting of the Vocal Tract Resonance (VTR) frequency trajectories obtained by human experts for the first three formant frequencies. This performance gain is evident over a range of filter orders
Speech Communication
Contains reports on three research projects.U.S. Air Force Cambridge Research Laboratories under Contract F19628-72-C-0181National Institutes of Health (Grant 5 RO1 NS04332-09)Joint Services Electronics Programs (U.S. Army, U. S. Navy, and U. S. Air Force) under Contract DAAB07-71-C-0300M. I. T. Lincoln Laboratory Purchase Order CC-57
Idealized computational models for auditory receptive fields
This paper presents a theory by which idealized models of auditory receptive
fields can be derived in a principled axiomatic manner, from a set of
structural properties to enable invariance of receptive field responses under
natural sound transformations and ensure internal consistency between
spectro-temporal receptive fields at different temporal and spectral scales.
For defining a time-frequency transformation of a purely temporal sound
signal, it is shown that the framework allows for a new way of deriving the
Gabor and Gammatone filters as well as a novel family of generalized Gammatone
filters, with additional degrees of freedom to obtain different trade-offs
between the spectral selectivity and the temporal delay of time-causal temporal
window functions.
When applied to the definition of a second-layer of receptive fields from a
spectrogram, it is shown that the framework leads to two canonical families of
spectro-temporal receptive fields, in terms of spectro-temporal derivatives of
either spectro-temporal Gaussian kernels for non-causal time or the combination
of a time-causal generalized Gammatone filter over the temporal domain and a
Gaussian filter over the logspectral domain. For each filter family, the
spectro-temporal receptive fields can be either separable over the
time-frequency domain or be adapted to local glissando transformations that
represent variations in logarithmic frequencies over time. Within each domain
of either non-causal or time-causal time, these receptive field families are
derived by uniqueness from the assumptions.
It is demonstrated how the presented framework allows for computation of
basic auditory features for audio processing and that it leads to predictions
about auditory receptive fields with good qualitative similarity to biological
receptive fields measured in the inferior colliculus (ICC) and primary auditory
cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table
A new recursive algorithm for time-varying autoregressive (TVAR) model estimation and its application to speech analysis
This paper proposes a new state-regularized (SR) and QR decomposition based recursive least squares (QRRLS) algorithm with variable forgetting factor (VFF) for recursive coefficient estimation of time-varying autoregressive (AR) models. It employs the estimated coefficients as prior information to minimize the exponentially weighted observation error, which leads to reduced variance and bias over traditional regularized RLS algorithm. It also increases the tracking speed by introducing a new measure of convergence status to control the FF. Simulations using synthetic and real speech signals show that the proposed method has improved tracking performance and reduced estimation error variance than conventional TVAR modeling methods during rapid changing of AR coefficients. © 2012 IEEE.published_or_final_versionThe 2012 IEEE International Symposium on Circuits and Systems (ISCAS), Seoul, Korea, 20-23 May 2012. In IEEE International Symposium on Circuits and Systems Proceedings, 2012, p. 1026-102
Speech Recognition in noisy environment using Deep Learning Neural Network
Recent researches in the field of automatic speaker recognition have shown that methods based
on deep learning neural networks provide better performance than other statistical classifiers. On
the other hand, these methods usually require adjustment of a significant number of parameters.
The goal of this thesis is to show that selecting appropriate value of parameters can significantly
improve speaker recognition performance of methods based on deep learning neural networks.
The reported study introduces an approach to automatic speaker recognition based on deep
neural networks and the stochastic gradient descent algorithm. It particularly focuses on three
parameters of the stochastic gradient descent algorithm: the learning rate, and the hidden and
input layer dropout rates. Additional attention was devoted to the research question of speaker
recognition under noisy conditions.
Thus, two experiments were conducted in the scope of this thesis. The first experiment was
intended to demonstrate that the optimization of the observed parameters of the stochastic
gradient descent algorithm can improve speaker recognition performance under no presence of
noise. This experiment was conducted in two phases. In the first phase, the recognition rate is
observed when the hidden layer dropout rate and the learning rate are varied, while the input
layer dropout rate was constant. In the second phase of this experiment, the recognition rate is
observed when the input layers dropout rate and learning rate are varied, while the hidden layer
dropout rate was constant. The second experiment was intended to show that the optimization of
the observed parameters of the stochastic gradient descent algorithm can improve speaker
recognition performance even under noisy conditions. Thus, different noise levels were
artificially applied on the original speech signal
- …