9 research outputs found

    Detection of Glottal Closure Instants based on the Microcanonical Multiscale Formalism

    Get PDF
    International audienceThis paper presents a novel algorithm for automatic detection of Glottal Closure Instants (GCI) from the speech signal. Our approach is based on a novel multiscale method that relies on precise estimation of a multiscale parameter at each time instant in the signal domain. This parameter quantifies the degree of signal singularity at each sample from a multi-scale point of view and thus its value can be used to classify signal samples accordingly. We use this property to develop a simple algorithm for detection of GCIs and we show that for the case of clean speech, our algorithm performs almost as well as a recent state-of-the-art method. Next, by performing a comprehensive comparison in presence of 14 different types of noises, we show that our method is more accurate (particularly for very low SNRs). Our method has lower computational times compared to others and does not rely on an estimate of pitch period or any critical choice of parameters

    Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification

    Get PDF
    International audienceClassical pitch-based perturbation measures, such as Jitter and Shimmer, are generally based on detection algorithms of pitch marks which assume the existence of a periodic pitch pattern and/or rely on the linear source-filter speech model. While these assumptions can hold for normal speech, they are generally not valid for pathological speech. The latter can indeed present strong aperiodicity, nonlinearity and turbulence/noise. Recently, we introduced on a novel nonlinear algorithm for Glottal Closure Instants (GCI) detection which has the strong advantage of not making such assumptions. In this paper, we use this new algorithm to compute standard pitch-based perturbation measures and compare its performances to the widely used tool PRAAT. We address the task of classification between normal and pathological speech, and carry out the experiments using the popular MEEI database. The results show that our algorithm leads to significantly higher classification accuracy than PRAAT. Moreover, some important statistical features become significantly discriminative, while they are meaningless when using PRAAT (in the sense that they have almost no discrimination power)

    Novel multiscale methods for nonlinear speech analysis

    Get PDF
    Cette thÚse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramÚtres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l analyse par prédiction linéaire parcimonieuse et une solution efficace pour l approximation multipulse du signal source d'excitation.This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Efficient GCI detection for efficient sparse linear prediction

    Get PDF
    International audienceWe propose a unified non-linear approach that offers an ef- ficient closed-form solution for the problem of sparse linear prediction analysis. The approach is based on our previous work for minimization of the weighted l2 -norm of the prediction error. The weighting of the l2 -norm is done in a way that less emphasis is given to the prediction error around the Glottal Closure Instants (GCI) as they are expected to attain the largest values of error and hence, the resulting cost function approaches the ideal l0 -norm cost function for sparse residual recovery. As such, the method requires knowledge of the GCIs. In this paper we use our recently developed GCI detection algorithm which is particularly suitable for this problem as it does not rely on residuals themselves for detection of GCIs. We show that our GCI detection algorithm provides slightly better sparsity properties in comparison to a recent powerful GCI detection algorithm. Moreover, as the computational cost of our GCI detection algorithm is quite low, the computational cost of the overall solution is considerably lower

    GCI DETECTION FROM RAW SPEECH USING A FULLY-CONVOLUTIONAL NETWORK

    Get PDF
    Glottal Closure Instants (GCI) detection consists in automatically detecting temporal locations of most significant excitation of the vocal tract from the speech signal. It is used in many speech analysis and processing applications, and various algorithms have been proposed for this purpose. Recently, new approaches using convo-lutional neural networks have emerged , with encouraging results. Following this trend, we propose a simple approach that performs a regression from the speech waveform to a target signal from which the GCI are easily obtained by peak-picking. However, the ground truth GCI used for training and evaluation are usually extracted from EGG signals, which are not reliable and often not available. To overcome this problem, we propose to train our network on high-quality synthetic speech with perfect ground truth. The performances of the proposed algorithm are compared with three other state-of-the-art approaches using publicly available datasets, and the impact of using controlled synthetic or real speech signals in the training stage is investigated. The experimental results demonstrate that the proposed method obtains similar or better results than other state-of-the-art algorithms and that using large synthetic datasets with many speaker offers better generalization ability than using a smaller database of real speech and EGG signals

    DĂ©tection automatique des instants de fermeture glottale dans les voixpathologiques

    Get PDF
    International audienceLe traitement de la parole est une thĂ©matique importante des sciences de l’ingĂ©nieur allianttraitement du signal et connaissances mĂ©dicales. La parole est utilisĂ©e comme vecteurd’informations pour de nombreuses applications industrielles, comme la reconnaissance dela parole, les interfaces homme machine et bien d’autres applications. L’étude de la parolepourrait permettre le diagnostic diffĂ©rentiel de maladies ayant comme symptĂŽme destroubles de la voix. Certaines d’entre elles sont dues Ă  un dysfonctionnement des cordesvocales.Des mĂ©thodes basĂ©es sur la parole ont Ă©tĂ© dĂ©veloppĂ©es ces derniĂšres annĂ©es pouridentifier les instants ou les cordes vocales entrent en contact, ces instants sont appelĂ©sGCIs (Glottal Closure Instant). La vĂ©ritĂ© terrain est obtenue par l'intermĂ©diaire del’electroglottographie (EGG). L’EGG fournit un signal, image du mouvement des cordesvocales. De ce signal, plusieurs mĂ©thodes permettent d’extraire automatiquement les GCIs.Ces mĂ©thodes donnent des rĂ©sultats diffĂ©rents et parfois faux pour des voixnon-pathologiques, et ne sont Ă  priori pas adaptĂ©es aux voix pathologiques. On peut doncdifficilement obtenir une vĂ©ritĂ© terrain fiable.La vĂ©ritĂ© terrain est pourtant indispensable pour pouvoir dĂ©velopper et Ă©tudier des mĂ©thodesde dĂ©tection des GCIs. Ceci a donc constituĂ© le principal axe de travail durant le stage.Speech processing is an important thematic of the sciences of the engineer allying signalprocessing and medical knowledge. Speech is used as an information vector for manyindustrial applications, such as speech recognition, man-machine interfaces and many otherapplications. The study of the speech could allow the diagnosis of a disease.Many diseases has voice disorders as symptoms. Some of them are due to a malfunction ofthe vocal cords.Speech-based methods have been developed in recent years to identify instants when vocalcords come into contact, these instants are called GCIs (Glottal Closure Instant). The groundtruth is obtained via electroglottography (EGG). The EGG provides an image signal of themovement of the vocal cords. From this signal several methods allow us to extract the GCIsautomatically. These methods give different and sometimes false results for non-pathologicalvoices and is probably not suitable for pathological voices. It is therefore difficult to obtain areliable ground truth.The ground truth is nevertheless essential to be able to develop and study methods ofdetection of GCIs. This is exactly the main subject I had to work on during my internship
    corecore