67 research outputs found

    Perceptual Information Loss due to Impaired Speech Production

    Get PDF
    Phonological classes define articulatory-free and articulatory-bound phone attributes. Deep neural network is used to estimate the probability of phonological classes from the speech signal. In theory, a unique combination of phone attributes form a phoneme identity. Probabilistic inference of phonological classes thus enables estimation of their compositional phoneme probabilities. A novel information theoretic framework is devised to quantify the information conveyed by each phone attribute, and assess the speech production quality for perception of phonemes. As a use case, we hypothesize that disruption in speech production leads to information loss in phone attributes, and thus confusion in phoneme identification. We quantify the amount of information loss due to dysarthric articulation recorded in the TORGO database. A novel information measure is formulated to evaluate the deviation from an ideal phone attribute production leading us to distinguish healthy production from pathological speech

    Acoustic and perceptual assessment of stop consonants produced by normal and dysarthric speakers

    Get PDF
    Thesis (Ph.D.)--Harvard--Massachusetts Institute of Technology Division of Health Sciences and Technology, 2000.Includes bibliographical references (p. 286-290).by Kelly Lynn Poort.Ph.D

    Perceptual Information Loss due to Impaired Speech Production

    Full text link

    The Relationship of Somatosensory Perception and Fine-Force Control in the Adult Human Orofacial System

    Get PDF
    The orofacial area stands apart from other body systems in that it possesses a unique performance anatomy whereby oral musculature inserts directly into the underlying cutaneous skin, allowing for the generation of complex three-dimensional deformations of the orofacial system. This anatomical substrate provides for the tight temporal synchrony between self-generated cutaneous somatosensation and oromotor control during functional behaviors in this region and provides the necessary feedback needed to learn and maintain skilled orofacial behaviors. The Directions into Velocity of Articulators (DIVA) model highlights the importance of the bidirectional relationship between sensation and production in the orofacial region in children learning speech. This relationship has not been as well-established in the adult orofacial system. The purpose of this observational study was to begin assessing the perception-action relationship in healthy adults and to describe how this relationship may be altered as a function of healthy aging. This study was designed to determine the correspondence between orofacial cutaneous perception using vibrotactile detection thresholds (VDT) and low-level static and dynamic force control tasks in three representative age cohorts. Correlational relationships among measures of somatosensory capacity and low-level skilled orofacial force control were determined for 60 adults (19-84 years). Significant correlational relationships were identified using non-parametric Spearman’s correlations with an alpha at 0.1 between the 5 Hz test probe and several 0.5 N low-level force control assessments in the static and slow ramp-and-hold condition. These findings indicate that as vibrotactile detection thresholds increase (labial sensation decreases), ability to maintain a low-level force endpoint decreases. Group data was analyzed using non-parametric Kruskal-Wallis tests and identified significant differences between the 5 Hz test frequency probe and various 0.5 N skilled force assessments for group variables such as age, pure tone hearing assessments, sex, speech usage and smoking history. Future studies will begin the processing of modeling this complex multivariate relationship in healthy individuals before moving to a disordered population

    PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech

    Get PDF
    Progressive apraxia of Speech (PAoS) is a progressive motor speech disorder associated with neurodegenerative disease causing impairment of phonetic encoding and motor speech planning. Clinical observation and acoustic studies show that duration analysis provides reliable cues for diagnosis of the disease progression and severity of articulatory disruption. The goal of this paper is to develop computational methods for objective evaluation of duration and trajectory of speech articulation. We use phonological posteriors as speech features. Phonological posteriors consist of probabilities of phonological classes estimated for every short segment of the speech signal. PAoS encompasses lengthening of duration which is more pronounced in vowels; we thus hypothesize that a small subset of phonological classes provide stronger evidence for duration and trajectory analysis. These classes are determined through analysis of linear prediction coefficients (LPC). To enable trajectory analysis without phonetic alignment, we exploit phonological structures defined through quantization of phonological posteriors. Duration and trajectory analysis are conducted on blocks of multiple consecutive segments possessing similar phonological structures. Moreover, unique phonological structures are identified for every severity condition

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Dysarthric speech analysis and automatic recognition using phase based representations

    Get PDF
    Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance. The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility. A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria. In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech
    • …
    corecore