18 research outputs found

    Modélisation et synthèse de voix chantée à partir de descripteurs visuels extraits d'images échographiques et optiques des articulateurs

    No full text
    This thesis reports newly developed methods which can be applied to extract relevant features from articulator images in rare singing: traditional Corsican and Sardinian polyphonies, Byzantine music, as well as Human Beat Box. We collected data, and modeled these using machine learning methods, specifically novel deep learning methods. We first modelled tongue ultrasound image sequences, carrying relevant articulatory information which would otherwise be difficult to interpret without specialized skills in ultrasound imaging. We developed methods to extract automatically the superior contour of the tongue displayed on ultrasound images. Our tongue contour extraction results are comparable with those obtained in the literature, which could lead to applications in singing pedagogy. Afterwards, we predicted the evolution of the vocal tract filter parameters from sequences of tongue and lip images, first on isolated vowel databases then on traditional Corsican singing. Applying the predicted filter parameters, combined with the development of a vocal source acoustic model exploiting electroglottographic recordings, allowed us to synthesize singing voice excerpts using articulatory images (of tongue and lips) and glottal activity, with results superior to those obtained using existing technics reported in the literature.Le travail présenté dans cette thèse porte principalement sur le développement de méthodes permettant d'extraire des descripteurs pertinents des images acquises des articulateurs dans les chants rares : les polyphonies traditionnelles Corses, Sardes, la musique Byzantine, ainsi que le Human Beat Box. Nous avons collecté des données, et employons des méthodes d'apprentissage statistique pour les modéliser, notamment les méthodes récentes d'apprentissage profond (Deep Learning).Nous avons étudié dans un premier temps des séquences d'images échographiques de la langue apportant des informations sur l'articulation, mais peu lisibles sans connaissance spécialisée en échographie. Nous avons développé des méthodes pour extraire de façon automatique le contour supérieur de la langue montré par les images échographiques. Nos travaux ont donné des résultats d'extraction du contour de la langue comparables à ceux obtenus dans la littérature, ce qui pourrait permettre des applications en pédagogie du chant.Ensuite, nous avons prédit l'évolution des paramètres du filtre qu'est le conduit vocal depuis des séquences d'images de langue et de lèvres, sur des bases de données constituées de voyelles isolées puis de chants traditionnels Corses. L'utilisation des paramètres du filtre du conduit vocal, combinés avec le développement d'un modèle acoustique de source vocale exploitant l'enregistrement électroglottographique, permet de synthétiser des extraits de voix chantée en utilisant les images articulatoires (de la langue et des lèvres)et l'activité glottique, avec des résultats supérieurs à ceux obtenus avec les techniques existant dans la littérature

    Singing voice modeling and synthesis using visual features extracted from ultrasound and optical images of articulators

    No full text
    Le travail présenté dans cette thèse porte principalement sur le développement de méthodes permettant d'extraire des descripteurs pertinents des images acquises des articulateurs dans les chants rares : les polyphonies traditionnelles Corses, Sardes, la musique Byzantine, ainsi que le Human Beat Box. Nous avons collecté des données, et employons des méthodes d'apprentissage statistique pour les modéliser, notamment les méthodes récentes d'apprentissage profond (Deep Learning).Nous avons étudié dans un premier temps des séquences d'images échographiques de la langue apportant des informations sur l'articulation, mais peu lisibles sans connaissance spécialisée en échographie. Nous avons développé des méthodes pour extraire de façon automatique le contour supérieur de la langue montré par les images échographiques. Nos travaux ont donné des résultats d'extraction du contour de la langue comparables à ceux obtenus dans la littérature, ce qui pourrait permettre des applications en pédagogie du chant.Ensuite, nous avons prédit l'évolution des paramètres du filtre qu'est le conduit vocal depuis des séquences d'images de langue et de lèvres, sur des bases de données constituées de voyelles isolées puis de chants traditionnels Corses. L'utilisation des paramètres du filtre du conduit vocal, combinés avec le développement d'un modèle acoustique de source vocale exploitant l'enregistrement électroglottographique, permet de synthétiser des extraits de voix chantée en utilisant les images articulatoires (de la langue et des lèvres)et l'activité glottique, avec des résultats supérieurs à ceux obtenus avec les techniques existant dans la littérature.This thesis reports newly developed methods which can be applied to extract relevant features from articulator images in rare singing: traditional Corsican and Sardinian polyphonies, Byzantine music, as well as Human Beat Box. We collected data, and modeled these using machine learning methods, specifically novel deep learning methods. We first modelled tongue ultrasound image sequences, carrying relevant articulatory information which would otherwise be difficult to interpret without specialized skills in ultrasound imaging. We developed methods to extract automatically the superior contour of the tongue displayed on ultrasound images. Our tongue contour extraction results are comparable with those obtained in the literature, which could lead to applications in singing pedagogy. Afterwards, we predicted the evolution of the vocal tract filter parameters from sequences of tongue and lip images, first on isolated vowel databases then on traditional Corsican singing. Applying the predicted filter parameters, combined with the development of a vocal source acoustic model exploiting electroglottographic recordings, allowed us to synthesize singing voice excerpts using articulatory images (of tongue and lips) and glottal activity, with results superior to those obtained using existing technics reported in the literature

    Gestion des malpositions dentaires en omnipratique

    No full text
    AIX-MARSEILLE2-BU Méd/Odontol. (130552103) / SudocSudocFranceF

    An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging

    No full text
    International audienceUltrasound imaging of the tongue and videos of lips movements can be used to investigate specific articulation in speech or singing voice. In this study, tongue and lips image sequences recorded during singing performance are used to predict vocal tract properties via Line Spectral Frequencies (LSF). We focused our work on traditional Corsican singing " Cantu in paghjella ". A multimodal Deep Autoencoder (DAE) extracts salient descriptors directly from tongue and lips images. Afterwards, LSF values are predicted from the most relevant of these features using a multilayer perceptron. A vocal tract model is derived from the predicted LSF, while a glottal flow model is computed from a synchronized electroglottographic recording. Articulatory-based singing voice synthesis is developed using both models. The quality of the prediction and singing voice synthesis using this method outperforms the state of the art method

    Tongue contour extraction from ultrasound images based on deep neural network

    No full text
    International audienceStudying tongue motion during speech using ultrasound is a standard procedure, however automatic ultrasound image labelling remains a challenge, as standard tongue shape extraction methods typically require human intervention. This article presents a method based on deep neuralnetworks to automatically extract tongue contours from speech ultrasound images. We use a deep autoencoder trained to learn the relationship between an image and its related contour, so that the model is able to automatically reconstruct contours from the ultrasound image alone. We use an automatic labelling algorithm instead of time-consuming handlabelling during the training process. We afterwards estimate the performances of both automatic labelling and contour extraction as compared to hand-labelling. Observed results show quality scores comparable to the state of the art

    The heartbeat evoked potential does not support strong interoceptive sensibility in trait mindfulness.

    No full text
    The enhancement of body awareness is proposed as one of the cognitive mechanisms that characterize mindfulness. To date, this hypothesis is supported by self-report and behavioral measures but still lacks physiological evidence. The current study investigated relation between trait mindfulness (i.e., individual differences in the ability to be mindful in daily life) and body awareness in combining a self-report measure (Multidimensional Assessment of Interoceptive Awareness [MAIA] questionnaire) with analysis of the heartbeat evoked potential (HEP), which is an event-related potential reflecting the cortical processing of the heartbeat. The HEP data were collected from 17 healthy participants under five minutes of resting-state condition. In addition, each participant completed the Freiburg Mindfulness Inventory and the MAIA questionnaire. Taking account of the important variability of HEP effects, analyses were replicated with the same participants three times (in three distinct sessions). First, group-level analyses showed that HEP amplitude and trait mindfulness do not correlate. Secondly, we observed that HEP amplitude could positively correlate with self-reported body awareness; however, this association was unreliable over time. Interestingly, we found that HEP measure shows very poor reliability over time at the individual level, potentially explaining the lack of reliable association between HEP and psychological traits. Lastly, a reliable positive correlation was found between self-reported trait mindfulness and body awareness. Taken together, these findings provide preliminary evidence that the HEP might not support the increased subjective body awareness in trait mindfulness, thus suggesting that perhaps objective and subjective measures of body awareness could be independent. This is the pre-peer reviewed version of the following article: Verdonk, C., Trousselard, M., Di Bernardi Luft, C., Medani, T., Billaud, J.-B., Ramdani, C., Canini, F., Claverie, D., Jaumard-Hakoun, A., & Vialatte, F. (2021). The heartbeat evoked potential does not support strong interoceptive sensibility in trait mindfulness. Psychophysiology, 00, 1– 13. https://doi.org/10.1111/psyp.13891, which has been published in final form at https://doi.org/10.1111/psyp.13891. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions
    corecore