106 research outputs found

    The Electromagnetic Articulography Mandarin Accented English (EMA-MAE) Corpus of Acoustic and 3D Articulatory Kinematic Data

    Get PDF
    There is a significant need for more comprehensive electromagnetic articulography (EMA) datasets that can provide matched acoustics and articulatory kinematic data with good spatial and temporal resolution. The Marquette University Electromagnetic Articulography Mandarin Accented English (EMA-MAE) corpus provides kinematic and acoustic data from 40 gender and dialect balanced speakers representing 20 Midwestern standard American English L1 speakers and 20 Mandarin Accented English (MAE) L2 speakers, half Beijing region dialect and half are Shanghai region dialect. Three dimensional EMA data were collected at a 400 Hz sampling rate using the NDI Wave system, with articulatory sensors on the midsagittal lips, lower incisors, tongue blade and dorsum, plus lateral lip corner and tongue body. Sensors provide three-dimensional position data as well as two-dimensional orientation data representing the orientation of the sensor plane. Data have been corrected for head movement relative to a fixed reference sensor and also adjusted using a biteplate calibration system to place the data in an articulatory working space relative to each subject\u27s individual midsagittal and maxillary occlusal planes. Speech materials include isolated words chosen to focus on specific contrasts between the English and Mandarin languages, as well as sentences and paragraphs for continuous speech, totaling approximately 45 minutes of data per subject. A beta version of the EMA-MAE corpus is now available, and the full corpus is in preparation for public release to help advance research in areas such as pronunciation modeling, acoustic-articulatory inversion, L1-L2 comparisons, pronunciation error detection, and accent modification training

    Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

    3D Scanning, Imaging, and Printing in Orthodontics

    Get PDF

    In vitro scan accuracy and time efficiency in various implant-supported fixed partial denture situations.

    Get PDF
    OBJECTIVES To compare the accuracy and time efficiency of different digital workflows in 3 implant-supported fixed partial denture situations. METHODS Three partially edentulous maxillary models with 2 implants (Model 1: implants at lateral incisor sites; Model 2: implants at right canine and first molar sites; Model 3: implants at right first premolar and first molar sites) were digitized (ATOS Capsule 200MV120, n=1) for reference scans. Test scans were performed for direct (Primescan (DDW-P) and Trios 3 (DDW-T)) and indirect (IDW) digital workflows (n=14). For IDW, stone casts (type IV) were obtained from vinylsiloxanether impressions and digitized (S600 Arti). The scan/impression and post processing times were recorded. Reference and test scans were superimposed (GOM Inspect) to calculate 3D point, inter-implant distance, and angular deviations. Kruskal-Wallis and Mann-Whitney tests were used for trueness and precision analyses (α=.05). RESULTS Tested workflows affected trueness (P≀.030) and precision (P<.001) of scans (3D point, inter-implant distance, and angular deviations) within models. DDW-P had the highest accuracy (3D point deviations) for models 1 and 3 (P≀.046). IDW had the lowest accuracy for model 2 (P<.01). DDW-P had the highest accuracy (inter-implant distance deviations) for model 3 (P≀.048). Direct digital workflow mostly led to lower angular deviations (P≀.040), and higher precision for models 2 (mesiodistal direction) and 3 (P<.001). The time for direct digital workflow was shorter (P<.001), DDW-P being more efficient than DDW-T (P=.008). CONCLUSION Direct digital workflow was more accurate and efficient than indirect digital workflow in tested partial edentulism situations with 2 implants. CLINICAL SIGNIFICANCE Tested intraoral scanners can be recommended for accurate and efficient impressions of anterior and posterior 3- or 4-unit implant-supported fixed partial dentures

    A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract

    Get PDF
    International audienceWe describe a minimally-supervised method for computing a statistical shape space model of the palate surface. The model is created from a corpus of volumetric magnetic resonance imaging (MRI) scans collected from 12 speakers. We extract a 3D mesh of the palate from each speaker, then train the model using principal component analysis (PCA). The palate model is then tested using 3D MRI from another corpus and evaluated using a high-resolution optical scan. We find that the error is low even when only a handful of measured coordinates are available. In both cases, our approach yields promising results. It can be applied to extract the palate shape from MRI data, and could be useful to other analysis modalities, such as electromagnetic articulography (EMA) and ultrasound tongue imaging (UTI)

    Effective 3D Geometric Matching for Data Restoration and Its Forensic Application

    Get PDF
    3D geometric matching is the technique to detect the similar patterns among multiple objects. It is an important and fundamental problem and can facilitate many tasks in computer graphics and vision, including shape comparison and retrieval, data fusion, scene understanding and object recognition, and data restoration. For example, 3D scans of an object from different angles are matched and stitched together to form the complete geometry. In medical image analysis, the motion of deforming organs is modeled and predicted by matching a series of CT images. This problem is challenging and remains unsolved, especially when the similar patterns are 1) small and lack geometric saliency; 2) incomplete due to the occlusion of the scanning and damage of the data. We study the reliable matching algorithm that can tackle the above difficulties and its application in data restoration. Data restoration is the problem to restore the fragmented or damaged model to its original complete state. It is a new area and has direct applications in many scientific fields such as Forensics and Archeology. In this dissertation, we study novel effective geometric matching algorithms, including curve matching, surface matching, pairwise matching, multi-piece matching and template matching. We demonstrate its applications in an integrated digital pipeline of skull reassembly, skull completion, and facial reconstruction, which is developed to facilitate the state-of-the-art forensic skull/facial reconstruction processing pipeline in law enforcement

    Speaker Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

    Registration and statistical analysis of the tongue shape during speech production

    Get PDF
    This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschĂ€ftigt sich mit der Analyse der menschlichen Zungenform wĂ€hrend der Sprachproduktion. ZunĂ€chst wird ein semi-ĂŒberwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schĂ€tzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spĂ€rliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

    A pilot study for the digital replacement of a distorted dentition acquired by Cone Beam Computed Tomography (CBCT)

    Get PDF
    Abstract Introduction: Cone beam CT (CBCT) is becoming a routine imaging modality designed for the maxillofacial region. Imaging patients with intra-oral metallic objects cause streak artefacts. Artefacts impair any virtual model by obliterating the teeth. This is a major obstacle for occlusal registration and the fabrication of orthognathic wafers to guide the surgical correction of dentofacial deformities. Aims and Objectives: To develop a method of replacing the inaccurate CBCT images of the dentition with an accurate representation and test the feasibility of the technique in the clinical environment. Materials and Method: Impressions of the teeth are acquired and acrylic baseplates constructed on dental casts incorporating radiopaque registration markers. The appliances are fitted and a preoperative CBCT is performed. Impressions are taken of the dentition with the devices in situ and subsequent dental models produced. The models are scanned to produce a virtual model. Both images of the patient and the model are imported into a virtual reality software program and aligned on the virtual markers. This allows the alignment of the dentition without relying on the teeth for superimposition. The occlusal surfaces of the dentition can be replaced with the occlusal image of the model. Results: The absolute mean distance of the mesh between the markers in the skulls was in the region of 0.09mm ± 0.03mm; the replacement dentition had an absolute mean distance of about 0.24mm ± 0.09mm. In patients the absolute mean distance between markers increased to 0.14mm ± 0.03mm. It was not possible to establish the discrepancies in the patient’s dentition, since the original image of the dentition is inherently inaccurate. Conclusion: It is possible to replace the CBCT virtual dentition of cadaveric skulls with an accurate representation to create a composite skull. The feasibility study was successful in the clinical arena. This could be a significant advancement in the accuracy of surgical prediction planning, with the ultimate goal of fabrication of a physical orthognathic wafer using reverse engineering

    Three dimensional study to quantify the relationship between facial hard and soft tissue movement as a result of orthognathic surgery

    Get PDF
    Introduction Prediction of soft tissue changes following orthognathic surgery has been frequently attempted in the past decades. It has gradually progressed from the classic “cut and paste” of photographs to the computer assisted 2D surgical prediction planning; and finally, comprehensive 3D surgical planning was introduced to help surgeons and patients to decide on the magnitude and direction of surgical movements as well as the type of surgery to be considered for the correction of facial dysmorphology. A wealth of experience was gained and numerous published literature is available which has augmented the knowledge of facial soft tissue behaviour and helped to improve the ability to closely simulate facial changes following orthognathic surgery. This was particularly noticed following the introduction of the three dimensional imaging into the medical research and clinical applications. Several approaches have been considered to mathematically predict soft tissue changes in three dimensions, following orthognathic surgery. The most common are the Finite element model and Mass tensor Model. These were developed into software packages which are currently used in clinical practice. In general, these methods produce an acceptable level of prediction accuracy of soft tissue changes following orthognathic surgery. Studies, however, have shown a limited prediction accuracy at specific regions of the face, in particular the areas around the lips. Aims The aim of this project is to conduct a comprehensive assessment of hard and soft tissue changes following orthognathic surgery and introduce a new method for prediction of facial soft tissue changes.   Methodology The study was carried out on the pre- and post-operative CBCT images of 100 patients who received their orthognathic surgery treatment at Glasgow dental hospital and school, Glasgow, UK. Three groups of patients were included in the analysis; patients who underwent Le Fort I maxillary advancement surgery; bilateral sagittal split mandibular advancement surgery or bimaxillary advancement surgery. A generic facial mesh was used to standardise the information obtained from individual patient’s facial image and Principal component analysis (PCA) was applied to interpolate the correlations between the skeletal surgical displacement and the resultant soft tissue changes. The identified relationship between hard tissue and soft tissue was then applied on a new set of preoperative 3D facial images and the predicted results were compared to the actual surgical changes measured from their post-operative 3D facial images. A set of validation studies was conducted. To include: ‱ Comparison between voxel based registration and surface registration to analyse changes following orthognathic surgery. The results showed there was no statistically significant difference between the two methods. Voxel based registration, however, showed more reliability as it preserved the link between the soft tissue and skeletal structures of the face during the image registration process. Accordingly, voxel based registration was the method of choice for superimposition of the pre- and post-operative images. The result of this study was published in a refereed journal. ‱ Direct DICOM slice landmarking; a novel technique to quantify the direction and magnitude of skeletal surgical movements. This method represents a new approach to quantify maxillary and mandibular surgical displacement in three dimensions. The technique includes measuring the distance of corresponding landmarks digitized directly on DICOM image slices in relation to three dimensional reference planes. The accuracy of the measurements was assessed against a set of “gold standard” measurements extracted from simulated model surgery. The results confirmed the accuracy of the method within 0.34mm. Therefore, the method was applied in this study. The results of this validation were published in a peer refereed journal. ‱ The use of a generic mesh to assess soft tissue changes using stereophotogrammetry. The generic facial mesh played a major role in the soft tissue dense correspondence analysis. The conformed generic mesh represented the geometrical information of the individual’s facial mesh on which it was conformed (elastically deformed). Therefore, the accuracy of generic mesh conformation is essential to guarantee an accurate replica of the individual facial characteristics. The results showed an acceptable overall mean error of the conformation of generic mesh 1 mm. The results of this study were accepted for publication in peer refereed scientific journal. Skeletal tissue analysis was performed using the validated “Direct DICOM slices landmarking method” while soft tissue analysis was performed using Dense correspondence analysis. The analysis of soft tissue was novel and produced a comprehensive description of facial changes in response to orthognathic surgery. The results were accepted for publication in a refereed scientific Journal. The main soft tissue changes associated with Le Fort I were advancement at the midface region combined with widening of the paranasal, upper lip and nostrils. Minor changes were noticed at the tip of the nose and oral commissures. The main soft tissue changes associated with mandibular advancement surgery were advancement and downward displacement of the chin and lower lip regions, limited widening of the lower lip and slight reversion of the lower lip vermilion combined with minimal backward displacement of the upper lip were recorded. Minimal changes were observed on the oral commissures. The main soft tissue changes associated with bimaxillary advancement surgery were generalized advancement of the middle and lower thirds of the face combined with widening of the paranasal, upper lip and nostrils regions. In Le Fort I cases, the correlation between the changes of the facial soft tissue and the skeletal surgical movements was assessed using PCA. A statistical method known as ’Leave one out cross validation’ was applied on the 30 cases which had Le Fort I osteotomy surgical procedure to effectively utilize the data for the prediction algorithm. The prediction accuracy of soft tissue changes showed a mean error ranging between (0.0006mm±0.582) at the nose region to (-0.0316mm±2.1996) at the various facial regions
    • 

    corecore