472 research outputs found

    A Reinvestigation of the Extended Kalman Filter applied to Formant Tracking

    Get PDF
    This paper examines the application of the Extended Kalman Filter to formant tracking. The derivation of the Jacobian matrix for the Extended Kalman filter procedure is given. Additionally, it demonstrates how robustness can be incorporated to the procedure. Results are presented to illustrate the formant tracking ability of the nonrobust and robust Extended Kalman filter algorithms

    Formant paths tracking using Linear Prediction based methods

    Get PDF
    This paper focuses on formants as basic parameters for vowels recognition. There are used two different algorithms for formants finding based on the LP algorithm: spectral peak picking and root extraction algorithm - obtaining very good path estimations by each algorithm. Those methods are compared in a graphical form in our application ‘WaveBlaster’

    Multirate Frequency Transformations: Wideband AM-FM Demodulation with Applications to Signal Processing and Communications

    Get PDF
    The AM-FM (amplitude & frequency modulation) signal model finds numerous applications in image processing, communications, and speech processing. The traditional approaches towards demodulation of signals in this category are the analytic signal approach, frequency tracking, or the energy operator approach. These approaches however, assume that the amplitude and frequency components are slowly time-varying, e.g., narrowband and incur significant demodulation error in the wideband scenarios. In this thesis, we extend a two-stage approach towards wideband AM-FM demodulation that combines multirate frequency transformations (MFT) enacted through a combination of multirate systems with traditional demodulation techniques, e.g., the Teager-Kasiser energy operator demodulation (ESA) approach to large wideband to narrowband conversion factors. The MFT module comprises of multirate interpolation and heterodyning and converts the wideband AM-FM signal into a narrowband signal, while the demodulation module such as ESA demodulates the narrowband signal into constituent amplitude and frequency components that are then transformed back to yield estimates for the wideband signal. This MFT-ESA approach is then applied to the various problems of: (a) wideband image demodulation and fingerprint demodulation, where multidimensional energy separation is employed, (b) wideband first-formant demodulation in vowels, and (c) wideband CPM demodulation with partial response signaling, to demonstrate its validity in both monocomponent and multicomponent scenarios as an effective multicomponent AM-FM signal demodulation and analysis technique for image processing, speech processing, and communications based applications

    Power-Weighted LPC Formant Estimation

    Get PDF
    A power-weighted formant frequency estimation procedure based on Linear Predictive Coding (LPC) is presented. It works by pre-emphasizing the dominant spectral components of an input signal, which allows a subsequent estimation step to extract formant frequencies with greater accuracy. The accuracy of traditional LPC formant estimation is improved by this new power-weighted formant estimator for different classes of synthetic signals and for speech. Power-weighted LPC significantly and reliably outperforms LPC and variants of LPC at the task of formant estimation using the VTR formants dataset, a database consisting of the Vocal Tract Resonance (VTR) frequency trajectories obtained by human experts for the first three formant frequencies. This performance gain is evident over a range of filter orders

    Speech Communication

    Get PDF
    Contains reports on three research projects.U.S. Air Force Cambridge Research Laboratories under Contract F19628-72-C-0181National Institutes of Health (Grant 5 RO1 NS04332-09)Joint Services Electronics Programs (U.S. Army, U. S. Navy, and U. S. Air Force) under Contract DAAB07-71-C-0300M. I. T. Lincoln Laboratory Purchase Order CC-57

    Idealized computational models for auditory receptive fields

    Full text link
    This paper presents a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to enable invariance of receptive field responses under natural sound transformations and ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or the combination of a time-causal generalized Gammatone filter over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table

    A new recursive algorithm for time-varying autoregressive (TVAR) model estimation and its application to speech analysis

    Get PDF
    This paper proposes a new state-regularized (SR) and QR decomposition based recursive least squares (QRRLS) algorithm with variable forgetting factor (VFF) for recursive coefficient estimation of time-varying autoregressive (AR) models. It employs the estimated coefficients as prior information to minimize the exponentially weighted observation error, which leads to reduced variance and bias over traditional regularized RLS algorithm. It also increases the tracking speed by introducing a new measure of convergence status to control the FF. Simulations using synthetic and real speech signals show that the proposed method has improved tracking performance and reduced estimation error variance than conventional TVAR modeling methods during rapid changing of AR coefficients. © 2012 IEEE.published_or_final_versionThe 2012 IEEE International Symposium on Circuits and Systems (ISCAS), Seoul, Korea, 20-23 May 2012. In IEEE International Symposium on Circuits and Systems Proceedings, 2012, p. 1026-102

    Speech Recognition in noisy environment using Deep Learning Neural Network

    Get PDF
    Recent researches in the field of automatic speaker recognition have shown that methods based on deep learning neural networks provide better performance than other statistical classifiers. On the other hand, these methods usually require adjustment of a significant number of parameters. The goal of this thesis is to show that selecting appropriate value of parameters can significantly improve speaker recognition performance of methods based on deep learning neural networks. The reported study introduces an approach to automatic speaker recognition based on deep neural networks and the stochastic gradient descent algorithm. It particularly focuses on three parameters of the stochastic gradient descent algorithm: the learning rate, and the hidden and input layer dropout rates. Additional attention was devoted to the research question of speaker recognition under noisy conditions. Thus, two experiments were conducted in the scope of this thesis. The first experiment was intended to demonstrate that the optimization of the observed parameters of the stochastic gradient descent algorithm can improve speaker recognition performance under no presence of noise. This experiment was conducted in two phases. In the first phase, the recognition rate is observed when the hidden layer dropout rate and the learning rate are varied, while the input layer dropout rate was constant. In the second phase of this experiment, the recognition rate is observed when the input layers dropout rate and learning rate are varied, while the hidden layer dropout rate was constant. The second experiment was intended to show that the optimization of the observed parameters of the stochastic gradient descent algorithm can improve speaker recognition performance even under noisy conditions. Thus, different noise levels were artificially applied on the original speech signal
    corecore