3,825 research outputs found

    Development of an Electromagnetic Glottal Waveform Sensor for Applications in High Acoustic Noise Environments

    Get PDF
    The challenges of measuring speech signals in the presence of a strong background noise cannot be easily addressed with traditional acoustic technology. A recent solution to the problem considers combining acoustic sensor measurements with real-time, non-acoustic detection of an aspect of the speech production process. While significant advancements have been made in that area using low-power radar-based techniques, drawbacks inherent to the operation of such sensors are yet to be surmounted. Therefore, one imperative scientific objective is to devise new, non-invasive non-acoustic sensor topologies that offer improvements regarding sensitivity, robustness, and acoustic bandwidth. This project investigates a novel design that directly senses the glottal flow waveform by measuring variations in the electromagnetic properties of neck tissues during voiced segments of speech. The approach is to explore two distinct sensor configurations, namely the“six-element and the“parallel-plate resonator. The research focuses on the modeling aspect of the biological load and the resonator prototypes using multi-transmission line (MTL) and finite element (FE) simulation tools. Finally, bench tests performed with both prototypes on phantom loads as well as human subjects are presented

    The weight of phonetic substance in the structure of sound inventories

    Get PDF
    In the research field initiated by Lindblom & Liljencrants in 1972, we illustrate the possibility of giving substance to phonology, predicting the structure of phonological systems with nonphonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). We proposed for vowel systems the Dispersion-Focalisation Theory (Schwartz et al., 1997b). With the DFT, we can predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α. The first one aims at increasing auditory distances between vowel spectra (dispersion), the second one aims at increasing the perceptual salience of each spectrum through formant proximities (focalisation). We also introduced new variants based on research in physics - namely, phase space (λ,α) and polymorphism of a given phase, or superstructures in phonological organisations (VallĂ©e et al., 1999) which allow us to generate 85.6% of 342 UPSID systems from 3- to 7-vowel qualities. No similar theory for consonants seems to exist yet. Therefore we present in detail a typology of consonants, and then suggest ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances by i) comparing them with language acquisition data at the babbling stage and looking at the capacity to acquire relatively different linguistic systems in relation with the main degrees of freedom of the articulators; ii) showing that the places “preferred” for each manner are at least partly conditioned by the morphological constraints that facilitate or complicate, make possible or impossible the needed articulatory gestures, e.g. the complexity of the articulatory control for voicing and the aerodynamics of fricatives. A rather strict coordination between the glottis and the oral constriction is needed to produce acceptable voiced fricatives (Mawass et al., 2000). We determine that the region where the combinations of Ag (glottal area) and Ac (constriction area) values results in a balance between the voice and noise components is indeed very narrow. We thus demonstrate that some of the main tendencies in the phonological vowel and consonant structures of the world’s languages can be explained partly by sensorimotor constraints, and argue that actually phonology can take part in a theory of Perception-for-Action-Control

    Estimation of Subglottal Pressure, Vocal Fold Collision Pressure, and Intrinsic Laryngeal Muscle Activation From Neck-Surface Vibration Using a Neural Network Framework and a Voice Production Model

    Get PDF
    The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pĂŠ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H2O) and 243 Pa (2.48 cm H2O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.Fil: Ibarra, Emiro J.. Universidad Tecnica Federico Santa Maria.; ChileFil: Parra, JesĂșs A.. Universidad Tecnica Federico Santa Maria.; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre RĂ­os. Instituto de InvestigaciĂłn y Desarrollo en BioingenierĂ­a y BioinformĂĄtica - Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Santa Fe. Instituto de InvestigaciĂłn y Desarrollo en BioingenierĂ­a y BioinformĂĄtica; ArgentinaFil: CortĂ©s, Juan P.. Universidad Tecnica Federico Santa Maria.; ChileFil: Espinoza, VĂ­ctor M.. Universidad de Chile; ChileFil: Mehta, Daryush D.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Zañartu, MatĂ­as. Universidad Tecnica Federico Santa Maria.; Chil

    Modal Locking Between Vocal Fold Oscillations and Vocal Tract Acoustics

    Get PDF
    During voiced speech, vocal folds interact with the vocal tract acoustics. The resulting glottal source-resonator coupling has been observed using mathematical and physical models as well as in in vivo phonation. We propose a computational time-domain model of the full speech apparatus that contains a feedback mechanism from the vocal tract acoustics to the vocal fold oscillations. It is based on numerical solution of ordinary and partial differential equations defined on vocal tract geometries that have been obtained by magnetic resonance imaging. The model is used to simulate rising and falling pitch glides of [alpha, i] in the fundamental frequency (f(o)) interval [145 Hz, 315 Hz]. The interval contains the first vocal tract resonance f(R1) and the first formant F-1 of [i] as well as the fractions of the first resonance f(R1)/5, f(R1)/4, and f(R1)/3 of [alpha]. The glide simulations reveal a locking pattern in the f(o) trajectory approximately at f(R1) of [i]. The resonance fractions of [alpha] produce perturbations in the pressure signal at the lips but no locking.Peer reviewe

    Stochastic mechanical model of vocal folds for producing jitter and for identifying pathologies through real voices

    Get PDF
    International audienceJitter, in voice production applications, is a random phenomenon characterized by the deviation of the glottal cycle length with respect to a mean value. Its study can help in identifying pathologies related to the vocal folds according to the values obtained through the different ways to measure it. This paper aims to propose a stochastic model, considering three control parameters, to generate jitter based on a deterministic one-mass model for the dynamics of the vocal folds and to identify parameters from the stochastic model taking into account real voice signals experimentally obtained. To solve the corresponding stochastic inverse problem, the cost function used is based on the distance between probability density functions of the random variables associated with the fundamental frequencies obtained by the experimental voices and the simulated ones, and also on the distance between features extracted from the voice signals , simulated and experimental, to calculate jitter. The results obtained show that the model proposed is valid and some samples of voices are synthesized considering the identified parameters for normal and pathological cases. The strategy adopted is also a novelty and mainly because a solution was obtained. In addition to the use of three parameters to construct the model of jitter, is the discussion of a parameter related to the bandwidth of the power spectral density function of the stochastic process to measure the quality of the signal generated. A study about the influence of all the main parameters is also performed. The identification of the parameters of the model considering pathological cases is maybe of all novelties introduced by the paper the most interesting

    Application of Poincare-Mapping of Voiced-Speech Segments for Emotion Sensing

    Get PDF
    The following paper introduces a group of novel speech-signal descriptors that reflect phoneme-pronunciation variability and that can be considered as potentially useful features for emotion sensing. The proposed group includes a set of statistical parameters of Poincare maps, derived for formant-frequency evolution and energy evolution of voiced-speech segments. Two groups of Poincare-map characteristics were considered in the research: descriptors of sample-scatter, which reflect magnitudes of phone-uttering variations and descriptors of cross-correlations that exist among samples and that evaluate consistency of variations. It has been shown that inclusion of the proposed characteristics into the pool of commonly used speech descriptors, results in a noticeable increase—at the level of 10%—in emotion sensing performance. Standard pattern recognition methodology has been adopted for evaluation of the proposed descriptors, with the assumption that three- or four-dimensional feature spaces can provide sufficient emotion sensing. Binary decision trees have been selected for data classification, as they provide with detailed information on emotion-specific discriminative power of various speech descriptors

    Fitting a biomechanical model of the folds to high-speed video data through bayesian estimation

    Get PDF
    High-speed video recording of the vocal folds during sustained phonation has become a widespread diagnostic tool, and the development of imaging techniques able to perform automated tracking and analysis of relevant glottal cues, such as folds edge position or glottal area, is an active research field. In this paper, a vocal folds vibration analysis method based on the processing of visual data through a biomechanical model of the layngeal dynamics is proposed. The procedure relies on a Bayesian non-stationary estimation of the biomechanical model parameters and state, to fit the folds edge position extracted from the high-speed video endoscopic data. This finely tuned dynamical model is then used as a state transition model in a Bayesian setting, and it allows to obtain a physiologically motivated estimation of upper and lower vocal folds edge position. Based on model prediction, an hypothesis on the lower fold position can be made even in complete fold occlusion conditions occurring during the end of the closed phase and the beginning of the open phase of the glottal cycle. To demonstrate the suitability of the procedure, the method is assessed on a set of audiovisual recordings featuring high-speed video endoscopic data from healthy subjects producing sustained voiced phonation with different laryngeal settings

    Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality

    Get PDF
    Digital waveguide physical modeling is often used as an efficient representation of acoustical resonators such as the human vocal tract. Building on the basic one-dimensional (1-D) Kelly-Lochbaum tract model, various speech synthesis techniques demonstrate improvements to the wave scattering mechanisms in order to better approximate wave propagation in the complex vocal system. Some of these techniques are discussed in this paper, with particular reference to an alternative approach in the form of a two-dimensional waveguide mesh model. Emphasis is placed on its ability to produce vowel spectra similar to that which would be present in natural speech, and how it improves upon the 1-D model. Tract area function is accommodated as model width, rather than translated into acoustic impedance, and as such offers extra control as an additional bounding limit to the model. Results show that the two-dimensional (2-D) model introduces approximately linear control over formant bandwidths leading to attainable realistic values across a range of vowels. Similarly, the 2-D model allows for application of theoretical reflection values within the tract, which when applied to the 1-D model result in small formant bandwidths, and, hence, unnatural sounding synthesized vowels
    • 

    corecore