2,936 research outputs found
Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement
This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages
A low-delay 8 Kb/s backward-adaptive CELP coder
Code excited linear prediction coding is an efficient technique for compressing speech sequences. Communications quality of speech can be obtained at bit rates below 8 Kb/s. However, relatively large coding delays are necessary to buffer the input speech in order to perform the LPC analysis. A low delay 8 Kb/s CELP coder is introduced in which the short term predictor is based on past synthesized speech. A new distortion measure that improves the tracking of the formant filter is discussed. Formal listening tests showed that the performance of the backward adaptive coder is almost as good as the conventional CELP coder
Asymmetric discrimination of non-speech tonal analogues of vowels
Published in final edited form as: J Exp Psychol Hum Percept Perform. 2019 February ; 45(2): 285–300. doi:10.1037/xhp0000603.Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences due to the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with non-speech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally-produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with two-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in one or both of these two acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in non-speech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited.Accepted manuscrip
Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods
In this study, formant tracking is investigated by refining the formants
tracked by an existing data-driven tracker, DeepFormants, using the formants
estimated in a model-driven manner by linear prediction (LP)-based methods. As
LP-based formant estimation methods, conventional covariance analysis (LP-COV)
and the recently proposed quasi-closed phase forward-backward (QCP-FB) analysis
are used. In the proposed refinement approach, the contours of the three lowest
formants are first predicted by the data-driven DeepFormants tracker, and the
predicted formants are replaced frame-wise with local spectral peaks shown by
the model-driven LP-based methods. The refinement procedure can be plugged into
the DeepFormants tracker with no need for any new data learning. Two refined
DeepFormants trackers were compared with the original DeepFormants and with
five known traditional trackers using the popular vocal tract resonance (VTR)
corpus. The results indicated that the data-driven DeepFormants trackers
outperformed the conventional trackers and that the best performance was
obtained by refining the formants predicted by DeepFormants using QCP-FB
analysis. In addition, by tracking formants using VTR speech that was corrupted
by additive noise, the study showed that the refined DeepFormants trackers were
more resilient to noise than the reference trackers. In general, these results
suggest that LP-based model-driven approaches, which have traditionally been
used in formant estimation, can be combined with a modern data-driven tracker
easily with no further training to improve the tracker's performance.Comment: Computer Speech and Language, Vol. 81, Article 101515, June 202
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals
In this paper, we propose a new method for the accurate estimation and
tracking of formants in speech signals using time-varying quasi-closed-phase
(TVQCP) analysis. Conventional formant tracking methods typically adopt a
two-stage estimate-and-track strategy wherein an initial set of formant
candidates are estimated using short-time analysis (e.g., 10--50 ms), followed
by a tracking stage based on dynamic programming or a linear state-space model.
One of the main disadvantages of these approaches is that the tracking stage,
however good it may be, cannot improve upon the formant estimation accuracy of
the first stage. The proposed TVQCP method provides a single-stage formant
tracking that combines the estimation and tracking stages into one. TVQCP
analysis combines three approaches to improve formant estimation and tracking:
(1) it uses temporally weighted quasi-closed-phase analysis to derive
closed-phase estimates of the vocal tract with reduced interference from the
excitation source, (2) it increases the residual sparsity by using the
optimization and (3) it uses time-varying linear prediction analysis over long
time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal
tract model and hence on the formant trajectories. Formant tracking experiments
with a wide variety of synthetic and natural speech signals show that the
proposed TVQCP method performs better than conventional and popular formant
tracking tools, such as Wavesurfer and Praat (based on dynamic programming),
the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on
deep neural networks trained in a supervised manner). Matlab scripts for the
proposed method can be found at: https://github.com/njaygowda/ftrac
A Reinvestigation of the Extended Kalman Filter applied to Formant Tracking
This paper examines the application of the Extended Kalman Filter to formant tracking. The derivation of the Jacobian matrix for the Extended Kalman filter procedure
is given. Additionally, it demonstrates how robustness can be incorporated to the procedure. Results are presented to illustrate the formant tracking ability of the nonrobust
and robust Extended Kalman filter algorithms
PYIN: A FUNDAMENTAL FREQUENCY ESTIMATOR USING PROBABILISTIC THRESHOLD DISTRIBUTIONS
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
- …