2,644 research outputs found

    Ultrax:An Animated Midsagittal Vocal Tract Display for Speech Therapy

    Get PDF
    Speech sound disorders (SSD) are the most common communication impairment in childhood, and can hamper social development and learning. Current speech therapy interventions rely predominantly on the auditory skills of the child, as little technology is available to assist in diagnosis and therapy of SSDs. Realtime visualisation of tongue movements has the potential to bring enormous benefit to speech therapy. Ultrasound scanning offers this possibility, although its display may be hard to interpret. Our ultimate goal is to exploit ultrasound to track tongue movement, while displaying a simplified, diagrammatic vocal tract that is easier for the user to interpret. In this paper, we outline a general approach to this problem, combining a latent space model with a dimensionality reducing model of vocal tract shapes. We assess the feasibility of this approach using magnetic resonance imaging (MRI) scans to train a model of vocal tract shapes, which is animated using electromagnetic articulography (EMA) data from the same speaker. Index Terms: Ultrasound, speech therapy, vocal tract visualisation 1

    Registration and statistical analysis of the tongue shape during speech production

    Get PDF
    This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschĂ€ftigt sich mit der Analyse der menschlichen Zungenform wĂ€hrend der Sprachproduktion. ZunĂ€chst wird ein semi-ĂŒberwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schĂ€tzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spĂ€rliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

    A multilinear tongue model derived from speech related MRI data of the human vocal tract

    Get PDF
    We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

    Reconstructing the full tongue contour from EMA/X-ray microbeam

    Full text link

    Automated location of orofacial landmarks to characterize airway morphology in anaesthesia via deep convolutional neural networks

    Get PDF
    Background:A reliable anticipation of a difficult airway may notably enhance safety during anaesthesia. In current practice, clinicians use bedside screenings by manual measurements of patients’ morphology. Objective:To develop and evaluate algorithms for the automated extraction of orofacial landmarks, which characterize airway morphology. Methods:We defined 27 frontal + 13 lateral landmarks. We collected n=317 pairs of pre-surgery photos from patients undergoing general anaesthesia (140 females, 177 males). As ground truth reference for supervised learning, landmarks were independently annotated by two anaesthesiologists. We trained two ad-hoc deep convolutional neural network architectures based on InceptionResNetV2 (IRNet) and MobileNetV2 (MNet), to predict simultaneously: (a) whether each landmark is visible or not (occluded, out of frame), (b) its 2D-coordinates (x, y). We implemented successive stages of transfer learning, combined with data augmentation. We added custom top layers on top of these networks, whose weights were fully tuned for our application. Performance in landmark extraction was evaluated by 10-fold cross-validation (CV) and compared against 5 state-of-the-art deformable models. Results:With annotators’ consensus as the ‘gold standard’, our IRNet-based network performed comparably to humans in the frontal view: median CV loss L=1.277·10-3, inter-quartile range (IQR) [1.001, 1.660]; versus median 1.360, IQR [1.172, 1.651], and median 1.352, IQR [1.172, 1.619], for each annotator against consensus, respectively. MNet yielded slightly worse results: median 1.471, IQR [1.139, 1.982]. In the lateral view, both networks attained performances statistically poorer than humans: median CV loss L=2.141·10-3, IQR [1.676, 2.915], and median 2.611, IQR [1.898, 3.535], respectively; versus median 1.507, IQR [1.188, 1.988], and median 1.442, IQR [1.147, 2.010] for both annotators. However, standardized effect sizes in CV loss were small: 0.0322 and 0.0235 (non-significant) for IRNet, 0.1431 and 0.1518 (p<0.05) for MNet; therefore quantitatively similar to humans. The best performing state-of-the-art model (a deformable regularized Supervised Descent Method, SDM) behaved comparably to our DCNNs in the frontal scenario, but notoriously worse in the lateral view. Conclusions:We successfully trained two DCNN models for the recognition of 27 + 13 orofacial landmarks pertaining to the airway. Using transfer learning and data augmentation, they were able to generalize without overfitting, reaching expert-like performances in CV. Our IRNet-based methodology achieved a satisfactory identification and location of landmarks: particularly in the frontal view, at the level of anaesthesiologists. In the lateral view, its performance decayed, although with a non-significant effect size. Independent authors had also reported lower lateral performances; as certain landmarks may not be clear salient points, even for a trained human eye.BERC.2022-2025 BCAM Severo Ochoa accreditation CEX2021-001142-S / MICIN / AEI / 10.13039/50110001103

    Towards 3D facial morphometry:facial image analysis applications in anesthesiology and 3D spectral nonrigid registration

    Get PDF
    In anesthesiology, the detection and anticipation of difficult tracheal intubation is crucial for patient safety. When undergoing general anesthesia, a patient who is unexpectedly difficult to intubate risks potential life-threatening complications with poor clinical outcomes, ranging from severe harm to brain damage or death. Conversely, in cases of suspected difficulty, specific equipment and personnel will be called upon to increase safety and the chances of successful intubation. Research in anesthesiology has associated a certain number of morphological features of the face and neck with higher risk of difficult intubation. Detecting and analyzing these and other potential features, thus allowing the prediction of difficulty of tracheal intubation in a robust, objective, and automatic way, may therefore improve the patients' safety. In this thesis, we first present a method to automatically classify images of the mouth cavity according to the visibility of certain oropharyngeal structures. This method is then integrated into a novel and completely automatic method, based on frontal and profile images of the patient's face, to predict the difficulty of intubation. We also provide a new database of three dimensional (3D) facial scans and present the initial steps towards a complete 3D model of the face suitable for facial morphometry applications, which include difficult tracheal intubation prediction. In order to develop and test our proposed method, we collected a large database of multimodal recordings of over 2700 patients undergoing general anesthesia. In the first part of this thesis, using two dimensional (2D) facial image analysis methods, we automatically extract morphological and appearance-based features from these images. These are used to train a classifier, which learns to discriminate between patients as being easy or difficult to intubate. We validate our approach on two different scenarios, one of them being close to a real-world clinical scenario, using 966 patients, and demonstrate that the proposed method achieves performance comparable to medical diagnosis-based predictions by experienced anesthesiologists. In the second part of this thesis, we focus on the development of a new 3D statistical model of the face to overcome some of the limitations of 2D methods. We first present EPFL3DFace, a new database of 3D facial expression scans, containing 120 subjects, performing 35 different facial expressions. Then, we develop a nonrigid alignment method to register the scans and allow for statistical analysis. Our proposed method is based on spectral geometry processing and makes use of an implicit representation of the scans in order to be robust to noise or holes in the surfaces. It presents the significant advantage of reducing the number of free parameters to optimize for in the alignment process by two orders of magnitude. We apply our proposed method on the data collected and discuss qualitative results. At its current level of performance, our fully automatic method to predict difficult intubation already has the potential to reduce the cost, and increase the availability of such predictions, by not relying on qualified anesthesiologists with years of medical training. Further data collection, in order to increase the number of patients who are difficult to intubate, as well as extracting morphological features from a 3D representation of the face are key elements to further improve the performance

    Normal human craniofacial growth and development from 0 to 4 years

    Get PDF
    Knowledge of human craniofacial growth (increase in size) and development (change in shape) is important in the clinical treatment of a range of conditions that afects it. This study uses an extensive collection of clinical CT scans to investigate craniofacial growth and development over the frst 48 months of life, detail how the cranium changes in form (size and shape) in each sex and how these changes are associated with the growth and development of various soft tissues such as the brain, eyes and tongue and the expansion of the nasal cavity. This is achieved through multivariate analyses of cranial form based on 3D landmarks and semi-landmarks and by analyses of linear dimensions, and cranial volumes. The results highlight accelerations and decelerations in cranial form changes throughout early childhood. They show that from 0 to 12 months, the cranium undergoes greater changes in form than from 12 to 48 months. However, in terms of the development of overall cranial shape, there is no signifcant sexual dimorphism in the age range considered in this study. In consequence a single model of human craniofacial growth and development is presented for future studies to examine the physio-mechanical interactions of the craniofacial growth

    Towards a complete 3D morphable model of the human head

    Get PDF
    Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for representing the 3D shapes and textures of an object class. Here we present the most complete 3DMM of the human head to date that includes face, cranium, ears, eyes, teeth and tongue. To achieve this, we propose two methods for combining existing 3DMMs of different overlapping head parts: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Thus we build a new combined face-and-head shape model that blends the variability and facial detail of an existing face model (the LSFM) with the full head modelling capability of an existing head model (the LYHM). Then we construct and fuse a highly-detailed ear model to extend the variation of the ear shape. Eye and eye region models are incorporated into the head model, along with basic models of the teeth, tongue and inner mouth cavity. The new model achieves state-of-the-art performance. We use our model to reconstruct full head representations from single, unconstrained images allowing us to parameterize craniofacial shape and texture, along with the ear shape, eye gaze and eye color.Comment: 18 pages, 18 figures, submitted to Transactions on Pattern Analysis and Machine Intelligence (TPAMI) on the 9th of October as an extension paper of the original oral CVPR paper : arXiv:1903.0378

    Neural Modeling and Imaging of the Cortical Interactions Underlying Syllable Production

    Full text link
    This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements. The model is a neural network whose components correspond to regions of the cerebral cortex and cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Computer simulations of the model verify its ability to account for compensation to lip and jaw perturbations during speech. Specific anatomical locations of the model's components are estimated, and these estimates are used to simulate fMRI experiments of simple syllable production with and without jaw perturbations.National Institute on Deafness and Other Communication Disorders (R01 DC02852, RO1 DC01925
    • 

    corecore