268 research outputs found

    GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

    Get PDF
    The goal of this dissertation is to develop methods to recover glottal flow pulses, which contain biometrical information about the speaker. The excitation information estimated from an observed speech utterance is modeled as the source of an inverse problem. Windowed linear prediction analysis and inverse filtering are first used to deconvolve the speech signal to obtain a rough estimate of glottal flow pulses. Linear prediction and its inverse filtering can largely eliminate the vocal-tract response which is usually modeled as infinite impulse response filter. Some remaining vocal-tract components that reside in the estimate after inverse filtering are next removed by maximum-phase and minimum-phase decomposition which is implemented by applying the complex cepstrum to the initial estimate of the glottal pulses. The additive and residual errors from inverse filtering can be suppressed by higher-order statistics which is the method used to calculate cepstrum representations. Some features directly provided by the glottal source\u27s cepstrum representation as well as fitting parameters for estimated pulses are used to form feature patterns that were applied to a minimum-distance classifier to realize a speaker identification system with very limited subjects

    Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis

    Get PDF
    abstract: Speech is generated by articulators acting on a phonatory source. Identification of this phonatory source and articulatory geometry are individually challenging and ill-posed problems, called speech separation and articulatory inversion, respectively. There exists a trade-off between decomposition and recovered articulatory geometry due to multiple possible mappings between an articulatory configuration and the speech produced. However, if measurements are obtained only from a microphone sensor, they lack any invasive insight and add additional challenge to an already difficult problem. A joint non-invasive estimation strategy that couples articulatory and phonatory knowledge would lead to better articulatory speech synthesis. In this thesis, a joint estimation strategy for speech separation and articulatory geometry recovery is studied. Unlike previous periodic/aperiodic decomposition methods that use stationary speech models within a frame, the proposed model presents a non-stationary speech decomposition method. A parametric glottal source model and an articulatory vocal tract response are represented in a dynamic state space formulation. The unknown parameters of the speech generation components are estimated using sequential Monte Carlo methods under some specific assumptions. The proposed approach is compared with other glottal inverse filtering methods, including iterative adaptive inverse filtering, state-space inverse filtering, and the quasi-closed phase method.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Työvälineet äänilähteen analyysiin: päivitetty Aalto Aparat ja jatkuvan puheen sekä samanaikaisen elektroglottorafisignaalin tietokanta

    Get PDF
    This thesis presents two tools for voice source analysis: updated Aalto Aparat inverse filtering programme, and a database of continuous Finnish speech and simultaneous electroglottography (EGG). A new glottal inverse filtering method, quasi closed phase glottal inverse filtering (QCP) has been implemented to Aalto Aparat, and usability of the programme has been improved. The results of the computations can now be transferred to other analysis programmes more efficiently. Also, a comprehensive manual of Aparat has been compiled. The database of continuous speech and EGG contains 20 recitations of a Finnish text by 10 male and 10 female native Finnish speakers. The recitations were recorded with a headset condense microphone and EGG electrodes. The recording sessions were performed in an anechoic chamber, and the full database contains almost an hour of material. The data can be used e.g. when evaluating new GIF methods.Tässä työssä esitetään kaksi työvälinettä äänilähteen mallintamiseen: päivitetty äänilähteen käänteissuodatusohjelma Aalto Aparat, sekä tietokanta jatkuvasta suomenkielisestä puheesta yhdessä elektroglottografisen (EGG) signaalin kanssa. Aalto Aparatiin lisättiin päivityksen yhteydessä yksi uusi käänteissuodatusmenetelmä, quasi closed phase inverse filtering (QCP), ja ohjelman käytettävyyttä parannettiin lisäämässä tuloksien tallennusvaihtoehtoja. Suodatustuloksia voi nyt siirtää entistä helpommin muihin analyysiohjelmiin. Lisäksi laadittiin kattava ohjekirja ohjelman käytöstä. Jatkuvan puheen ja EGG signaalin tietokanta sisältää 20 nauhoitetta, joissa lyhyt suomenkielinen tekstinäyte on luettu ääneen. Lukijoina oli 10 mies- ja 10 naispuolista suomenkielistä puhujaa. Ääneenluvut tallennettiin pantamikrofonin ja EGG elektrodien avulla. Äänitykset tehtiin kaiuttomassa huoneessa, ja kokonaisuudessaan tietokanta sisältää noin tunnin verran materiaalia, jota voidaan käyttää mm. uusien äänilähteen käänteissuodatusmenetelmien arvioimiseen

    LF model based glottal source parameter estimation by extended Kalman filtering

    Get PDF
    A new algorithm for glottal source parameter estimation of voiced speech based on the Liljencrants-Fant (LF) model is presented in this work. Each pitch period of the inverse filtered glottal flow derivative is divided into two phases according to the glottal closing instant and an extended Kalman filter is iteratively applied to estimate the shape controlling parameters for both phases. By searching the minimal mean square error between the reconstructed LF pulse and the original signal, an optimal set of estimates can be obtained. Preliminary experimental results show that the proposed algorithm is effective for a wide range of LF parameters for different voice qualities with different noise levels, and accuracy especially for estimation of return phase parameters compares better than standard time-domain fitting methods while requiring a significantly lower computational load

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Aspiration noise during phonation : synthesis, analysis, and pitch-scale modification

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 139-145).The current study investigates the synthesis and analysis of aspiration noise in synthesized and spoken vowels. Based on the linear source-filter model of speech production, we implement a vowel synthesizer in which the aspiration noise source is temporally modulated by the periodic source waveform. Modulations in the noise source waveform and their synchrony with the periodic source are shown to be salient for natural-sounding vowel synthesis. After developing the synthesis framework, we research past approaches to separate the two additive components of the model. A challenge for analysis based on this model is the accurate estimation of the aspiration noise component that contains energy across the frequency spectrum and temporal characteristics due to modulations in the noise source. Spectral harmonic/noise component analysis of spoken vowels shows evidence of noise modulations with peaks in the estimated noise source component synchronous with both the open phase of the periodic source and with time instants of glottal closure. Inspired by this observation of natural modulations in the aspiration noise source, we develop an alternate approach to the speech signal processing aim of accurate pitch-scale modification. The proposed strategy takes a dual processing approach, in which the periodic and noise components of the speech signal are separately analyzed, modified, and re-synthesized. The periodic component is modified using our implementation of time-domain pitch-synchronous overlap-add, and the noise component is handled by modifying characteristics of its source waveform.(cont.) Since we have modeled an inherent coupling between the original periodic and aspiration noise sources, the modification algorithm is designed to preserve the synchrony between temporal modulations of the two sources. The reconstructed modified signal is perceived to be natural-sounding and generally reduces artifacts that are typically heard in current modification techniques.by Daryush Mehta.S.M
    corecore