668 research outputs found

    Mechanism of and Threshold Biomechanical Conditions for Falsetto Voice Onset

    Get PDF
    The sound source of a voice is produced by the self-excited oscillation of the vocal folds. In modal voice production, a drastic increase in transglottal pressure after vocal fold closure works as a driving force that develops self-excitation. Another type of vocal fold oscillation with less pronounced glottal closure observed in falsetto voice production has been accounted for by the mucosal wave theory. The classical theory assumes a quasi-steady flow, and the expected driving force onto the vocal folds under wavelike motion is derived from the Bernoulli effect. However, wavelike motion is not always observed during falsetto voice production. More importantly, the application of the quasi-steady assumption to a falsetto voice with a fundamental frequency of several hundred hertz is unsupported by experiments. These considerations suggested that the mechanism of falsetto voice onset may be essentially different from that explained by the mucosal wave theory. In this paper, an alternative mechanism is submitted that explains how self-excitation reminiscent of the falsetto voice could be produced independent of the glottal closure and wavelike motion. This new explanation is derived through analytical procedures by employing only general unsteady equations of motion for flow and solids. The analysis demonstrated that a convective acceleration of a flow induced by rapid wall movement functions as a negative damping force, leading to the self-excitation of the vocal folds. The critical subglottal pressure and volume flow are expressed as functions of vocal fold biomechanical properties, geometry, and voice fundamental frequency. The analytically derived conditions are qualitatively and quantitatively reasonable in view of reported measurement data of the thresholds required for falsetto voice onset. Understanding of the voice onset mechanism and the explicit mathematical descriptions of thresholds would be beneficial for the diagnosis and treatment of voice diseases and the development of artificial vocal folds

    Analysis of phonation onsets in vowel production, using information from glottal area and flow estimate

    Get PDF
    A multichannel dataset comprising high-speed videoendoscopy images, and electroglottography and free-field microphone signals, was used to investigate phonation onsets in vowel production. Use of the multichannel data enabled simultaneous analysis of the two main aspects of phonation, glottal area, extracted from the high-speed videoendoscopy images, and glottal flow, estimated from the microphone signal using glottal inverse filtering. Pulse-wise parameterization of the glottal area and glottal flow indicate that there is no single dominant way to initiate quasi-stable phonation. The trajectories of fundamental frequency and normalized amplitude quotient, extracted from glottal area and estimated flow, may differ markedly during onsets. The location and steepness of the amplitude envelopes of the two signals were observed to be closely related, and quantitative analysis supported the hypothesis that glottal area and flow do not carry essentially different amplitude information during vowel onsets. Linear models were used to predict the phonation onset times from the characteristics of the subsequent steady phonation. The phonation onset time of glottal area was found to have good predictability from a combination of the fundamental frequency and the normalized amplitude quotient of the glottal flow, as well as the gender of the speaker. For the phonation onset time of glottal flow, the best linear model was obtained using the fundamental frequency and the normalized amplitude quotient of the glottal flow as predictors.Peer reviewe

    Biosimulation of Vocal Fold Inflammation and Healing

    Get PDF
    Personalized, pre-emptive and predictive medicine is the capstone of contemporary medical care. The central aim of this dissertation is to address clinical challenges in prescribing personalized therapy to patients with acute phonotrauma. Inflammation and healing, which are innate tissue responses to mechanical stress/ trauma, are regulated by a complex dynamic system. A systems biology approach, which combines empirical, mathematical and computational tools, was taken to study the biological complexity of this dynamic system in vocal fold injury.Computational agent-based models (ABMs) were developed to quantitatively characterize multiple cellular and molecular interactions around inflammation and healing. The models allowed for tests of various hypothetical effects of motion-based treatments in individuals with acute phonotrauma. A phonotrauma ABM was calibrated and verified with empirical data of a panel of inflammatory mediators, obtained from laryngeal secretions in individuals following experimentally induced phonotrauma and a randomly assigned motion-based treatment. A supplementary ABM of surgically induced vocal fold trauma was developed and subsequently calibrated and verified with empirical data of inflammatory mediators and extracellular matrix substances from rat studies, for the purpose of gaining insight into the &ldquo net effect &rdquo of cellular and molecular responses at the tissue level.ABM simulations reproduced and predicted trajectories of inflammatory mediators and extracellular matrix as seen in empirical data of phonotrauma and surgical vocal fold trauma. The simulation results illustrated a spectrum of inflammatory responses to phonotrauma, surgical trauma and motion-based treatments. The results suggested that resonant voice exercise may optimize the combination of para- and anti-inflammatory responses to accelerate healing. Moreover, the ABMs suggested that hyaluronan fragments might be an early molecular index of tissue damage that is sensitive to varying stress levels - from relatively low phonatory stress to high surgical stress.We propose that this translational application of biosimulation can be used to quantitatively chart individual healing trajectories, test the effects of different treatment options and most importantly provide new understanding of laryngeal health and healing. By placing biology on a firm mathematical foundation, this line of research has potential to influence the contour of scientific thinking and clinical care of vocal fold injury

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    A novel framework for high-quality voice source analysis and synthesis

    Get PDF
    The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Numerical Study of Laryngeal Control of Phonation using Realistic Finite Element Models of a Canine Larynx

    Get PDF
    While many may take it for granted, the human voice is an incredible feat. An average person can produce a great variety of voices and change voice characteristics agilely even without formal training. Last several decades of research has established that the production of voice is largely a mechanical process: i.e., the sustained vibration of the vocal folds driven by the glottal air flow. Since one only has a single pair of vocal folds, the versatility comes with the ability to change the mechanical status of the vocal folds, including vocal fold length and thickness, tension, and level of adduction, through activation of the laryngeal muscles. However, the relationship between laryngeal muscle activity and the characteristics of voice is not well understood due to limitations in experimental observation and simplifications in modelling and simulations. The science is still far behind the art. The current research aims to investigate first the relationship between laryngeal muscle activation and the posture of the vocal folds and second the relationship between voice source characteristics and vocal fold mechanical status using more comprehensive numerical models and simulations, thus improving the understanding of the roles of each laryngeal muscle in voice control. To do so, (1) the mechanics involved in vocal fold posturing and vibration, especially muscle contraction; (2) the realistic anatomical structure of the larynx must be considered properly. To achieve this goal, a numerical model of the larynx as realistic as possible was built. The geometry of the laryngeal components was reconstructed from high resolution MRI (Magnetic Resonance Imaging) data of an excised canine larynx, which makes more accurate the representation of the muscles and their sub-compartments, cartilages, and other important anatomical features of the larynx. A previously proposed muscle activation model was implemented in a 3D finite element package and applied to the larynx model to simulate the action of laryngeal muscles. After validation of the numerical model against experimental data, extensive parametric studies involving different combination of muscle activations were conducted to investigate how the voice source is controlled with laryngeal muscles. In the course of this study, some work was done to couple the same finite element tool with a Genetic Algorithm program to inversely determine model parameters in biomechanical models. The method was applied in a collaborated study on shape changes of a fish fin during swimming. This study is presented as a separate chapter at the end of this thesis. The method has potential application in determining parameters in vocal fold models and optimizing clinical vocal fold procedures. This thesis is essentially an assembly of the papers published by the author during the doctoral study, with the addition of an introductory chapter. Chapter 1 reviews the overall principles of voice production, the biomechanical basis of voice control, and past studies on voice control with a focus on the fundamental frequency. Chapter 2 describes the major numerical methods employed in this research with an emphasis on the finite element method. The muscle activation model is also described in this chapter. Chapter 3 describes the building of the larynx model from MRI data and its partial validation. Chapter 4 presents the application of the larynx model to posturing studies, including parametric activation of muscle groups and specific topics related to vocal fold posturing. Chapter 5 describes the change of vocal fold vibration dynamics under the influence of the interaction of the cricothyroid muscle and the thyroarytenoid muscle. The Flow-structure interaction simulations was realized by coupling the larynx model to a simple Bernoulli flow model and a two-stage simulation technique. Chapter 6 concludes the current thesis study. Suggestions for future studies are proposed. Chapter 7 is an independent study that is not related to voice control. It describes a numerical framework that inversely determines and validates model parameters of biomechanical models. The application of the proposed framework to a finite element model of a fish fin is presented

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Numerical Modeling of Vocal Control and Patient-specific Surgical Planning of Type 1 Thyroplasty

    Get PDF
    This study aims to develop knowledge about the roles of intrinsic laryngeal muscles on voice control in both healthy and disordered conditions through comprehensive computational models. The phonation simulator was built by combining a three-dimensional high-fidelity MRI-based model of the larynx, active muscle mechanics, and fluid-structure-acoustic interaction model, which enabled the exploration of the underlayer mechanisms of the link between individual and/or group muscles contractions under both symmetric and asymmetric activations, vocal fold posture, vocal fold vibration, and voice outcomes during voice production. The first part of this research extensively investigated the effects of cricothyroid and thyroarytenoid muscle activations on voice characteristics through a parametric study. The role of these intrinsic muscles in the adjustment of geometrical and mechanical properties of vocal fold pre-phonatory posture, glottic flow aerodynamics, and acoustic and how all these components interact were explored. Results were comprehensively validated, and the link between elements of phonation was described in detail. In the next step, due to the model\u27s ability in the individual muscle activations, unilateral vocal fold paralysis was simulated, and the characteristics of disordered voice were analyzed. The voice simulator was then combined with the implant insertion model and genetic algorithm method to build a computational framework for patient-specific surgical planning of type 1 thyroplasty. This surgery is a standard procedure for treating unilateral vocal fold paralysis; however, it is subject to challenges mainly due to the small size of the implant and the high sensitivity of the voice outcome to the implant shape and position. Therefore, although the patient\u27s voice could be improved, the results might not be as satisfying as expected. Despite actual surgery, with very little room for try and error, the ideal implant could be achieved by optimizing the implant based on the patient\u27s desired voice using the presented computational framework. Both healthy and diseased cases and the corrected case using the optimized implant were simulated. Results revealed that the optimized implant could restore the aerodynamic and acoustic features of the disordered voice in producing a sustained vowel utterance. Furthermore, the performance of the implant in the pitch gliding test, which was simulated using temporal activation of the cricothyroid and thyroarytenoid muscles based on the first part of the study, was evaluated. In the final step, a physics-informed neural network-based algorithm was presented to reconstruct the three-dimensional cyclic vibration of vocal fold using two-dimensional sparse experimental data and laws of physics. Key acoustic parameters and vibratory dynamics of vocal folds and other parameters, such as flow rate, pressure distribution, and contact force, which are difficult to measure experimentally, were successfully predicted

    Voice source characterization for prosodic and spectral manipulation

    Get PDF
    The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase. In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters reported in the literature, complemented with our own results from the vowel database. The results show that our method gives satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good). Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in order to achieve quality levels similar to the reference methods. As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters extracted using our algorithm have a positive impact in the field of automatic emotion classification
    corecore