56 research outputs found

    VOICE BIOMETRICS FUSION FOR ENHANCED SECURITY AND SPEAKER RECOGNITION: A COMPREHENSIVE REVIEW

    Get PDF
    The scope of this paper is purposefully limited to the 15 voice biometrics modalities discussed by Jain et al. (2004). The place of Voice within their classification scheme is reexamined in light of important developments that have taken place since 2010. Additionally, elements are added to Mayhew’s (2018) overview of the history of biometrics as an attempt to fill in missing gaps concerning Voice. All this leads to a reassessment of voice biometrics and how it relates to other biometric modalities. Speech segments that carry extremely high identity vector loads are discussed. The main assertion of this paper is that increased computing power, advanced algorithms, and the deployment of Artificial Intelligent have made voice biometrics optimal for use. Furthermore, the analysis of the compatibility among modalities, the estimation of inconvenience penalty, and the calculation of the arithmetic distances between various modalities indicate that the fusion of {Voice + Face}, {Voice + Fingerprint}, {Voice + Iris}, and {Voice + Signature} on the one hand, and of {Voice + Face +Fingerprint}, {Voice +Fingerprint + Signature} on the other, offer the best liveliness assurance against hacking, spoofing, and other malicious activities

    Estimation of Subglottal Pressure, Vocal Fold Collision Pressure, and Intrinsic Laryngeal Muscle Activation From Neck-Surface Vibration Using a Neural Network Framework and a Voice Production Model

    Get PDF
    The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H2O) and 243 Pa (2.48 cm H2O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.Fil: Ibarra, Emiro J.. Universidad Tecnica Federico Santa Maria.; ChileFil: Parra, Jesús A.. Universidad Tecnica Federico Santa Maria.; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre Ríos. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática; ArgentinaFil: Cortés, Juan P.. Universidad Tecnica Federico Santa Maria.; ChileFil: Espinoza, Víctor M.. Universidad de Chile; ChileFil: Mehta, Daryush D.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Zañartu, Matías. Universidad Tecnica Federico Santa Maria.; Chil

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Numerical and Experimental Investigations on Vocal Fold Approximation in Healthy and Simulated Unilateral Vocal Fold Paralysis

    Get PDF
    We have developed a novel surgical/computational model for the investigation of unilat-eral vocal fold paralysis (UVFP) which will be used to inform future in silico approaches to improve surgical outcomes in type I thyroplasty. Healthy phonation (HP) was achieved using cricothyroid suture approximation on both sides of the larynx to generate symmetrical vocal fold closure. Following high-speed videoendoscopy (HSV) capture, sutures on the right side of the larynx were removed, partially releasing tension unilaterally and generating asymmetric vocal fold closure characteristic of UVFP (sUVFP condition). HSV revealed symmetric vibration in HP, while in sUVFP the sutured side demonstrated a higher frequency (10–11%). For the computational model, ex vivo magnetic resonance imaging (MRI) scans were captured at three configurations: non-approximated (NA), HP, and sUVFP. A finite-element method (FEM) model was built, in which cartilage displacements from the MRI images were used to prescribe the adduction, and the vocal fold deformation was simulated before the eigenmode calculation. The results showed that the frequency comparison between the two sides was consistent with observations from HSV. This alignment between the surgical and computational models supports the future application of these methods for the investigation of treatment for UVFP

    Direct measurement and modeling of intraglottal, subglottal, and vocal fold collision pressures during phonation in an individual with a hemilaryngectomy

    Get PDF
    The purpose of this paper is to report on the first in vivo application of a recently developed transoral, dual-sensor pressure probe that directly measures intraglottal, subglottal, and vocal fold collision pressures during phonation. Synchronous measurement of intraglottal and subglottal pressures was accomplished using two miniature pressure sensors mounted on the end of the probe and inserted transorally in a 78-year-old male who had previously undergone surgical removal of his right vocal fold for treatment of laryngeal cancer. The endoscopist used one hand to position the custom probe against the surgically medialized scar band that replaced the right vocal fold and used the other hand to position a transoral endoscope to record laryngeal high-speed videoendoscopy of the vibrating left vocal fold contacting the pressure probe. Visualization of the larynx during sustained phonation allowed the endoscopist to place the dual-sensor pressure probe such that the proximal sensor was positioned intraglottally and the distal sensor subglottally. The proximal pressure sensor was verified to be in the strike zone of vocal fold collision during phonation when the intraglottal pressure signal exhibited three characteristics: an impulsive peak at the start of the closed phase, a rounded peak during the open phase, and a minimum value around zero immediately preceding the impulsive peak of the subsequent phonatory cycle. Numerical voice production modeling was applied to validate model-based predictions of vocal fold collision pressure using kinematic vocal fold measures. The results successfully demonstrated feasibility of in vivo measurement of vocal fold collision pressure in an individual with a hemilaryngectomy, motivating ongoing data collection that is designed to aid in the development of vocal dose measures that incorporate vocal fold impact collision and stresses.Fil: Mehta, Daryush D.. Massachusetts General Hospital; Estados UnidosFil: Kobler, James B.. Massachusetts General Hospital; Estados UnidosFil: Zeitels, Steven M.. Harvard Medical School. Department of Medicine. Massachusetts General Hospital; Estados UnidosFil: Zañartu, Matías. Universidad Técnica Federico Santa María; ChileFil: Ibarra, Emiro J.. Universidad Técnica Federico Santa María; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre Ríos. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática; ArgentinaFil: Manriquez, Rodrigo. Universidad Técnica Federico Santa María; ChileFil: Erath, Byron D.. Clarkson University; Estados UnidosFil: Peterson, Sean D.. University of Waterloo; CanadáFil: Petrillo, Robert H.. Center For Laryngeal Surgery and Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery and Voice Rehabilitation; Estados Unidos. Harvard Medical School. Department of Medicine. Massachusetts General Hospital; Estados Unido

    Prosodically Conditioned Realization of Voiced Stops and Vowels in Yucatecan Spanish

    Get PDF
    This dissertation investigates the acoustic nature and distribution of prosodic strengthening in relation to the Prosodic Word domain and prosodic prominence in Yucatecan Spanish. In order to do so, phonologically voiced stops and word-initial vowels were examined in a corpus of sociolinguistic interviews and a read speech task with 16–21 speakers of the variety. The results provide evidence for prosodic strengthening of both voiced stops and word-initial vowels. The acoustic manifestations of prosodic strengthening of voiced stops are (i) longer duration, (ii) greater change in intensity, and, in extreme cases of strengthening, (iii) presence of a release burst. Strengthening of word-initial vowels is manifested through glottalization, which is present in the first portion of the vowel. Prosodic strengthening occurs in PW-initial position and especially under lexical stress, although accentuation may also play a role. Thus, prosodic strengthening is used to indicate (post)lexical prominence and boundaries at the PW level. In terms of speaker-specific variation, Yucatec Maya language dominance does not appear to favor more strengthened realizations either of voiced stops or word-initial vowels, while gender has no effect on the distribution of strengthened realizations. Finally, a proposal is made for the strengthening of voiced stops and glottalization of word-initial vowels being used to mark the left edges of a recursive PW in Yucatecan Spanish

    Voice breaking and its relation to body mass and testosterone level in the Siberian Crane (Leucogeranus leucogeranus)

    Get PDF
    Vocal development of cranes (Gruidae) has attracted scientifc interest due to its special stage, voice breaking. During voice breaking, chicks of diferent crane species produce calls with two fundamental frequencies that correspond to those in adult low-frequency and juvenile high-frequency vocalizations. However, triggers that afect voice breaking in cranes are mainly unknown. Here we studied the voice breaking in the Siberian Crane (Leucogeranus leucogeranus) and test its relation to the body mass and testosterone level. We analyzed 5846 calls, 39 body mass measurements and 60 blood samples from 11 Siberian Crane chicks in 8 ages from 2.5 to 18 months of life together with 90 body mass measurements and 61 blood samples from 24 Siberian Crane adults. The individual duration of voice breaking and dates of its onset, culmination and completion depended neither on the body mass nor on the testosterone level at various ages. But we found correlation between the testosterone level and mean deltas of percentages of the high and low frequency components in Siberian Crane calls between the closest recording sessions. We also observed some coincidence in time between the mean dates of voice breaking onset and the termination of body mass gain (at 7.5 months of age), and between the mean dates of voice breaking completion and the start of a new breeding season. Similar relations have been shown previously for some other crane species. We also showed for the frst time that the mean dates of voice breaking culmination correlated with the signifcant increase of the testosterone level (at 10.5 months of age). So, we suggest that voice breaking in cranes may be triggered by the end of chicks’ body growth, is stimulated by the increase of testosterone level and ends soon after adult cranes stop taking care of their chicks

    Tonal split and laryngeal contrast of onset consonant in Lili Wu Chinese

    Get PDF
    Descriptive and Comparative Linguistic

    Acoustic and videoendoscopic techniques to improve voice assessment via relative fundamental frequency

    Get PDF
    Quantitative measures of laryngeal muscle tension are needed to improve assessment and track clinical progress. Although relative fundamental frequency (RFF) shows promise as an acoustic estimate of laryngeal muscle tension, it is not yet transferable to the clinic. The purpose of this work was to refine algorithmic estimation of RFF, as well as to enhance the knowledge surrounding the physiological underpinnings of RFF. The first study used a large database of voice samples collected from 227 speakers with voice disorders and 256 typical speakers to evaluate the effects of fundamental frequency estimation techniques and voice sample characteristics on algorithmic RFF estimation. By refining fundamental frequency estimation using the Auditory Sawtooth Waveform Inspired Pitch Estimator—Prime (Auditory-SWIPE′) algorithm and accounting for sample characteristics via the acoustic measure, pitch strength, algorithmic errors related to the accuracy and precision of RFF were reduced by 88.4% and 17.3%, respectively. The second study sought to characterize the physiological factors influencing acoustic outputs of RFF estimation. A group of 53 speakers with voice disorders and 69 typical speakers each produced the utterance, /ifi/, while simultaneous recordings were collected using a microphone and flexible nasendoscope. Acoustic features calculated via the microphone signal were examined in reference to the physiological initiation and termination of vocal fold vibration. The features that corresponded with these transitions were then implemented into the RFF algorithm, leading to significant improvements in the precision of the RFF algorithm to reflect the underlying physiological mechanisms for voicing offsets (p < .001, V = .60) and onsets (p < .001, V = .54) when compared to manual RFF estimation. The third study further elucidated the physiological underpinnings of RFF by examining the contribution of vocal fold abduction to RFF during intervocalic voicing offsets. Vocal fold abductory patterns were compared to RFF values in a subset of speakers from the second study, comprising young adults, older adults, and older adults with Parkinson’s disease. Abductory patterns were not significantly different among the three groups; however, vocal fold abduction was observed to play a significant role in measures of RFF at voicing offset. By improving algorithmic estimation and elucidating aspects of the underlying physiology affecting RFF, this work adds to the utility of RFF for use in conjunction with current clinical techniques to assess laryngeal muscle tension.2021-09-29T00:00:00
    • …
    corecore