75 research outputs found

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    On the use of voice descriptors for glottal source shape parameter estimation

    Get PDF
    International audienceThis paper summarizes the results of our investigations into estimating the shape of the glottal excitation source from speech signals. We employ the Liljencrants-Fant (LF) model describing the glottal flow and its derivative. The one-dimensional glottal source shape parameter Rd describes the transition in voice quality from a tense to a breathy voice. The parameter Rd has been derived from a statistical regression of the R waveshape parameters which parameterize the LF model. First, we introduce a variant of our recently proposed adaptation and range extension of the Rd parameter regression. Secondly, we discuss in detail the aspects of estimating the glottal source shape parameter Rd using the phase minimization paradigm. Based on the analysis of a large number of speech signals we describe the major conditions that are likely to result in erroneous Rd estimates. Based on these findings we investigate into means to increase the robustness of the Rd parameter estimation. We use Viterbi smoothing to suppress unnatural jumps of the estimated Rd parameter contours within short time segments. Additionally, we propose to steer the Viterbi algorithm by exploiting the covariation of other voice descriptors to improve Viterbi smoothing. The novel Viterbi steering is based on a Gaussian Mixture Model (GMM) that represents the joint density of the voice descriptors and the Open Quotient (OQ) estimated from corresponding electroglottographic (EGG) signals. A conversion function derived from the mixture model predicts OQ from the voice descriptors. Converted to Rd it defines an additional prior probability to adapt the partial probabilities of the Viterbi algorithm accordingly. Finally, we evaluate the performances of the phase minimization based methods using both variants to adapt and extent the Rd regression on one synthetic test set as well as in combination with Viterbi smoothing and each variant of the novel Viterbi steering on one test set of natural speech. The experimental findings exhibit improvements for both Viterbi approaches

    Ihmisen äänentuoton analysointi käänteissuodatuksen, suurnopeuskuvauksen ja elektroglottografian avulla

    Get PDF
    Human voice production was studied using three methods: inverse filtering, digital high-speed imaging of the vocal folds, and electroglottography. The primary goal was to evaluate an inverse filtering method by comparing inverse filtered glottal flow estimates with information obtained by the other methods. More detailed examination of the human voice source behavior was also included in the work. Material from two experiments was analyzed in this study. The data of the first experiment consisted of simultaneous recordings of acoustic speech signal, electroglottogram, and high-speed imaging acquired during sustained vowel phonations. Inverse filtered glottal flow estimates were compared with glottal area waveforms derived from the image material by calculating pulse shape parameters from the signals. The material of the second experiment included recordings of acoustic speech signal and electroglottogram during phonations of sustained vowels. This material was utilized for the analysis of the opening phase and the closing phase of vocal fold vibration. The evaluated inverse filtering method was found to produce mostly reasonable estimates of glottal flow. However, the parameters of the system have to be set appropriately, which requires experience on inverse filtering and speech production. The flow estimates often showed a two-stage opening phase with two instants of rapid increase in the flow derivative. The instant of glottal opening detected in the electroglottogram was often found to coincide with an increase in the flow derivative. The instant of minimum flow derivative was found to occur mostly during the last quarter of the closing phase and it was shown to precede the closing peak of the differentiated electroglottogram.Ihmisen puheentuottoa tutkittiin kolmella menetelmällä: käänteissuodatuksella, äänihuulten digitaalisella suurnopeuskuvauksella ja elektroglottografialla. Päätavoitteena oli tarkastella erään käänteissuodatusmenetelmän toimintaa vertailemalla näillä menetelmillä saatua informaatiota äänihuulten värähtelystä. Lisäksi tutkittiin tarkemmin eräitä äänilähteen käyttäytymisen yksityiskohtia. Tutkimuksessa analysoitiin aineistoa kahdesta koejärjestelystä. Ensimmäisessä kokeessa tallennettiin samanaikaisesti äänisignaali, elektroglottogrammi ja suurnopeuskuvamateriaalia äänihuulista koehenkilöiden tuottaessa pitkiä vokaaleita. Käänteissuodatuksella saaduista glottisvirtausestimaateista sekä kuvamateriaalin ilmaisemasta ääniraon pinta-alavaihtelusta laskettiin pulssiparametreja, joiden avulla vertailtiin virtauksen ja ääniraon pinta-alan käyttäytymistä. Toisen koejärjestelyn aineisto koostui äänisignaalista ja elektroglottogrammista, jotka oli tallennettu vokaaliääntöjen aikana. Tämän materiaalin perusteella analysoitiin ääniraon avautumis- ja sulkeutumisvaihetta. Tarkastellun käänteissuodatusmenetelmän todettiin tuottavan enimmäkseen luotettavia virtausestimaatteja edellyttäen, että menetelmän parametrit asetetaan tarkoituksenmukaisesti, mikä vaatii käyttäjältä kokemusta käänteissuodatuksesta ja ihmisen puheentuotosta. Glottisvirtauksen avautumisvaiheen havaittiin olevan useissa virtausestimaateissa kaksivaiheinen siten, että virtauksen kasvu voimistuu nopeasti kahdessa kohdassa sulkeutumisen ja maksimivirtauksen välillä. Virtauksen kasvun todettiin usein voimistuvan elektroglottogrammista tunnistetun ääniraon avautumishetken lähellä. Virtauksen derivaatan minimikohdan havaittiin sijoittuvan enimmäkseen virtauksen sulkeutumisvaiheen viimeiseen neljännekseen, ja sen osoitettiin esiintyvän ennen elektroglottogrammin derivaatan minimikohtaa

    Comparison of laryngoscopic, glottal and vibratory parameters among Estill qualities – Case study

    Get PDF
    Estill Voice Training (EVT) is an effective educational system for developing and controlling distinct voice qualities used in contemporary commercial singing. EVT teaches six vocal qualities that differ at 13 levels. This study aims to investigate whether the distinct vocal qualities taught by EVT can be systematically differentiated based on laryngoscopic observations and vocal fold oscillation parameters. To investigate the differences in six EVT qualities, laryngeal dimensions and glottal area waveform parameters were measured in a single female subject who performed it in one-octave scale. Glottis Analysis Tools (GAT) were used to measure these parameters and phonovibrograms were obtained from the analysis. The resulting data were subjected to factor analysis to identify the systematic differences between EVT qualities. High-speed videolaryngoscopy analysis revealed a significant influence of vocal qualities on vocal fold oscillations. The factor analysis of the data identified three factors based on laryngeal dimension and four factors derived from GAT parameters. The first GAT factor was influenced by posterior adduction and distinguished belt quality from other qualities, suggesting a significant influence of the aryepiglottic sphincter. The second GAT factor contained parameters derived from glottal length and amplitude, suggesting a relationship not only with vocal registers but also with laryngeal height. The third GAT factor was best related to body-cover figure and phonation type (membranous medialization), while the fourth GAT factor was related to the amplitude-length-ratio. These findings suggest that vocal fold oscillations can be used to distinguish between Estill voice qualities

    A novel framework for high-quality voice source analysis and synthesis

    Get PDF
    The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore