650 research outputs found

    A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

    Full text link
    Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat

    Real-Time Magnetic Resonance Imaging

    Get PDF
    Real‐time magnetic resonance imaging (RT‐MRI) allows for imaging dynamic processes as they occur, without relying on any repetition or synchronization. This is made possible by modern MRI technology such as fast‐switching gradients and parallel imaging. It is compatible with many (but not all) MRI sequences, including spoiled gradient echo, balanced steady‐state free precession, and single‐shot rapid acquisition with relaxation enhancement. RT‐MRI has earned an important role in both diagnostic imaging and image guidance of invasive procedures. Its unique diagnostic value is prominent in areas of the body that undergo substantial and often irregular motion, such as the heart, gastrointestinal system, upper airway vocal tract, and joints. Its value in interventional procedure guidance is prominent for procedures that require multiple forms of soft‐tissue contrast, as well as flow information. In this review, we discuss the history of RT‐MRI, fundamental tradeoffs, enabling technology, established applications, and current trends

    Cardiac magnetic resonance assessment of central and peripheral vascular function in patients undergoing renal sympathetic denervation as predictor for blood pressure response

    Get PDF
    Background: Most trials regarding catheter-based renal sympathetic denervation (RDN) describe a proportion of patients without blood pressure response. Recently, we were able to show arterial stiffness, measured by invasive pulse wave velocity (IPWV), seems to be an excellent predictor for blood pressure response. However, given the invasiveness, IPWV is less suitable as a selection criterion for patients undergoing RDN. Consequently, we aimed to investigate the value of cardiac magnetic resonance (CMR) based measures of arterial stiffness in predicting the outcome of RDN compared to IPWV as reference. Methods: Patients underwent CMR prior to RDN to assess ascending aortic distensibility (AAD), total arterial compliance (TAC), and systemic vascular resistance (SVR). In a second step, central aortic blood pressure was estimated from ascending aortic area change and flow sequences and used to re-calculate total arterial compliance (cTAC). Additionally, IPWV was acquired. Results: Thirty-two patients (24 responders and 8 non-responders) were available for analysis. AAD, TAC and cTAC were higher in responders, IPWV was higher in non-responders. SVR was not different between the groups. Patients with AAD, cTAC or TAC above median and IPWV below median had significantly better BP response. Receiver operating characteristic (ROC) curves predicting blood pressure response for IPWV, AAD, cTAC and TAC revealed areas under the curve of 0.849, 0.828, 0.776 and 0.753 (p = 0.004, 0.006, 0.021 and 0.035). Conclusions: Beyond IPWV, AAD, cTAC and TAC appear as useful outcome predictors for RDN in patients with hypertension. CMR-derived markers of arterial stiffness might serve as non-invasive selection criteria for RDN

    Fast upper airway magnetic resonance imaging for assessment of speech production and sleep apnea

    Get PDF
    The human upper airway is involved in various functions, including speech, swallowing, and respiration. Magnetic resonance imaging (MRI) can visualize the motion of the upper airway and has been used in scientific studies to understand the dynamics of vocal tract shaping during speech and for assessment of upper airway abnormalities related to obstructive sleep apnea and swallowing disorders. Acceleration technologies in MRI are crucial in improving spatiotemporal resolution or spatial coverage. Recent trends in technical aspects of upper airway MRI are to develop state-of-the-art image acquisition methods for improved dynamic imaging of the upper airway and develop automatic image analysis methods for efficient and accurate quantification of upper airway parameters of interest. This review covers the fast upper airway magnetic resonance (MR) acquisition and reconstruction, MR experimental issues, image analysis techniques, and applications, mainly with respect to studies of speech production and sleep apnea

    Registration and statistical analysis of the tongue shape during speech production

    Get PDF
    This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschĂ€ftigt sich mit der Analyse der menschlichen Zungenform wĂ€hrend der Sprachproduktion. ZunĂ€chst wird ein semi-ĂŒberwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schĂ€tzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spĂ€rliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

    Using Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences

    Get PDF
    Automatic tongue, velum (i.e., soft palate), and pharyngeal movement tracking systems provide a significant benefit for the analysis of dynamic speech movements. Studies have been conducted using ultrasound, x-ray, and Magnetic Resonance Images (MRI) to examine the dynamic nature of the articulators during speech. Simulating the movement of the tongue, velum, and pharynx is often limited by image segmentation obstacles, where, movements of the velar structures are segmented through manual tracking. These methods are extremely time-consuming, coupled with inherent noise, motion artifacts, air interfaces, and refractions often complicate the process of computer-based automatic tracking. Furthermore, image segmentation and processing techniques of velopharyngeal structures often suffer from leakage issues related to the poor image quality of the MRI and the lack of recognizable boundaries between the velum and pharynx during contact moments. Computer-based tracking algorithms are developed to overcome these disadvantages by utilizing machine learning techniques and corresponding speech signals that may be considered prior information. The purpose of this study is to illustrate a methodology to track the velum and pharynx from a MRI sequence using the Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) by analyzing the corresponding audio signals. Auditory models such as MFCC have been widely used in Automatic Speech Recognition (ASR) systems. Our method uses customized version of the traditional approach for audio feature extraction in order to extract visual feature from the outer boundaries of the velum and the pharynx marked (selected pixel) by a novel method, The reduced audio features helps to shrink the search space of HMM and improve the system performance.   Three hundred consecutive images were tagged by the researcher. Two hundred of these images and the corresponding audio features (5 seconds) were used to train the HMM and a 2.5 second long audio file was used to test the model. The error rate was measured by calculating minimum distance between predicted and actual markers. Our model was able to track and animate dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) indicated the segmented structures, even though the contours of contacted areas were fuzzy and unrecognizable.  M.S

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy
    • 

    corecore