408 research outputs found
Extraction of vocal-tract system characteristics from speechsignals
We propose methods to track natural variations in the characteristics of the vocal-tract system from speech signals. We are especially interested in the cases where these characteristics vary over time, as happens in dynamic sounds such as consonant-vowel transitions. We show that the selection of appropriate analysis segments is crucial in these methods, and we propose a selection based on estimated instants of significant excitation. These instants are obtained by a method based on the average group-delay property of minimum-phase signals. In voiced speech, they correspond to the instants of glottal closure. The vocal-tract system is characterized by its formant parameters, which are extracted from the analysis segments. Because the segments are always at the same relative position in each pitch period, in voiced speech the extracted formants are consistent across successive pitch periods. We demonstrate the results of the analysis for several difficult cases of speech signals
On the use of voice descriptors for glottal source shape parameter estimation
International audienceThis paper summarizes the results of our investigations into estimating the shape of the glottal excitation source from speech signals. We employ the Liljencrants-Fant (LF) model describing the glottal flow and its derivative. The one-dimensional glottal source shape parameter Rd describes the transition in voice quality from a tense to a breathy voice. The parameter Rd has been derived from a statistical regression of the R waveshape parameters which parameterize the LF model. First, we introduce a variant of our recently proposed adaptation and range extension of the Rd parameter regression. Secondly, we discuss in detail the aspects of estimating the glottal source shape parameter Rd using the phase minimization paradigm. Based on the analysis of a large number of speech signals we describe the major conditions that are likely to result in erroneous Rd estimates. Based on these findings we investigate into means to increase the robustness of the Rd parameter estimation. We use Viterbi smoothing to suppress unnatural jumps of the estimated Rd parameter contours within short time segments. Additionally, we propose to steer the Viterbi algorithm by exploiting the covariation of other voice descriptors to improve Viterbi smoothing. The novel Viterbi steering is based on a Gaussian Mixture Model (GMM) that represents the joint density of the voice descriptors and the Open Quotient (OQ) estimated from corresponding electroglottographic (EGG) signals. A conversion function derived from the mixture model predicts OQ from the voice descriptors. Converted to Rd it defines an additional prior probability to adapt the partial probabilities of the Viterbi algorithm accordingly. Finally, we evaluate the performances of the phase minimization based methods using both variants to adapt and extent the Rd regression on one synthetic test set as well as in combination with Viterbi smoothing and each variant of the novel Viterbi steering on one test set of natural speech. The experimental findings exhibit improvements for both Viterbi approaches
Alternating minimisation for glottal inverse filtering
A new method is proposed for solving the glottal inverse filtering (GIF) problem. The goal of GIF is to separate an acoustical speech signal into two parts: the glottal airflow excitation and the vocal tract filter. To recover such information one has to deal with a blind deconvolution problem. This ill-posed inverse problem is solved under a deterministic setting, considering unknowns on both sides of the underlying operator equation. A stable reconstruction is obtained using a double regularization strategy, alternating between fixing either the glottal source signal or the vocal tract filter. This enables not only splitting the nonlinear and nonconvex problem into two linear and convex problems, but also allows the use of the best parameters and constraints to recover each variable at a time. This new technique, called alternating minimization glottal inverse filtering (AM-GIF), is compared with two other approaches: Markov chain Monte Carlo glottal inverse filtering (MCMC-GIF), and iterative adaptive inverse filtering (IAIF), using synthetic speech signals. The recent MCMC-GIF has good reconstruction quality but high computational cost. The state-of-the-art IAIF method is computationally fast but its accuracy deteriorates, particularly for speech signals of high fundamental frequency (F0). The results show the competitive performance of the new method: With high F0, the reconstruction quality is better than that of IAIF and close to MCMC-GIF while reducing the computational complexity by two orders of magnitude.Peer reviewe
- …