12 research outputs found

    Estimación de la frecuencia fundamental de señales de voz usando transfromada wavelet

    Get PDF
    En la estimación de la frecuencia fundamental de señales de voz usando transformada Wavelet es común usar el hecho de que ocurren máximos locales a través de las escalas de descomposición en la vecindad del instante de cierre glótico (Glottal Closure Instant-GCI). Dichos métodos se basan en la correlación de las posiciones de los máximos locales para varias escalas de descomposición; pero ello no es tan simple porque existen muchos máximos locales en una señal de voz y, además, las escalas correspondientes a las frecuencias altas son fácilmente afectadas por el ruido. Se propone un método basado en la determinación y correlación de las distancias para cada escala de descomposición, el cual funciona ante perturbaciones de ruido blanco gausiano. Su desempeño se compara respecto a la base de datos Keele Pitch Database con el método SIFT(Simplified Inverse Filtering Tracking) el cual es un método de estimación de la frecuencia fundamental comúnmente usado en sistemas comerciales

    A quantitative assessment of group delay methods for identifying glottal closures in voiced speech

    No full text
    Published versio

    A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

    Get PDF
    Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases

    A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

    Get PDF
    Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases

    Fundamental frequency estimation of low-quality electroglottographic signals

    Get PDF
    Fundamental frequency (fo) is often estimated based on electroglottographic (EGG) signals. Due to the nature of the method, the quality of EGG signals may be impaired by certain features like amplitude or baseline drifts, mains hum or noise. The potential adverse effects of these factors on fo estimation has to date not been investigated. Here, the performance of thirteen algorithms for estimating fo was tested, based on 147 synthesized EGG signals with varying degrees of signal quality deterioration. Algorithm performance was assessed through the standard deviation σfo of the difference between known and estimated fo data, expressed in octaves. With very few exceptions, simulated mains hum, and amplitude and baseline drifts did not influence fo results, even though some algorithms consistently outperformed others. When increasing either cycle-to-cycle fo variation or the degree of subharmonics, the SIGMA algorithm had the best performance (max. σfo = 0.04). That algorithm was however more easily disturbed by typical EGG equipment noise, whereas the NDF and Praat's auto-correlation algorithms performed best in this category (σfo = 0.01). These results suggest that the algorithm for fo estimation of EGG signals needs to be selected specifically for each particular data set. Overall, estimated fo data should be interpreted with care

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Novel multiscale methods for nonlinear speech analysis

    Get PDF
    Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l analyse par prédiction linéaire parcimonieuse et une solution efficace pour l approximation multipulse du signal source d'excitation.This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF
    corecore