2,380 research outputs found
Objective intelligibility assessment of pathological speakers
Intelligibility is a primary measure for the assessment of pathological speech. Traditionally, it is measured using a perceptual test, which is by definition subjective in nature. Consequently, there is a great interest in reliable, automatic and therefore objective methods. This paper presents such a method that incorporates an automatic speech recognizer (ASR) for producing features that characterize the pronunciations of a speaker and an intelligibility prediction model (IPM) for converting these features into an intelligibility score. High correlations (about 0.90) between objective and perceptual scores are obtained with a system comprising two different speech recognizers: one with traditional acoustic models relating acoustical observations to triphone states and one using phonological features as an intermediate layer between the acoustical observations and the phonetic states
Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
Intelligibility is widely used to measure the severity of articulatory problems in pathological speech. Recently, a number of automatic intelligibility assessment tools have been developed. Most of them use automatic speech recognizers (ASR) to compare the patient's utterance with the target text. These methods are bound to one language and tend to be less accurate when speakers hesitate or make reading errors. To circumvent these problems, two different ASR-free methods were developed over the last few years, only making use of the acoustic or phonological properties of the utterance. In this paper, we demonstrate that these ASR-free techniques are also able to predict intelligibility in other languages. Moreover, they show to be complementary, resulting in even better intelligibility predictions when both methods are combined
Simulating dysarthric speech for training data augmentation in clinical speech applications
Training machine learning algorithms for speech applications requires large,
labeled training data sets. This is problematic for clinical applications where
obtaining such data is prohibitively expensive because of privacy concerns or
lack of access. As a result, clinical speech applications are typically
developed using small data sets with only tens of speakers. In this paper, we
propose a method for simulating training data for clinical applications by
transforming healthy speech to dysarthric speech using adversarial training. We
evaluate the efficacy of our approach using both objective and subjective
criteria. We present the transformed samples to five experienced
speech-language pathologists (SLPs) and ask them to identify the samples as
healthy or dysarthric. The results reveal that the SLPs identify the
transformed speech as dysarthric 65% of the time. In a pilot classification
experiment, we show that by using the simulated speech samples to balance an
existing dataset, the classification accuracy improves by about 10% after data
augmentation.Comment: Will appear in Proc. of ICASSP 201
Cross-linguistic study of vocal pathology: perceptual features of spasmodic dysphonia in French-speaking subjects
Clinical characterisation of Spasmodic Dysphonia of the adductor type (SD) in French speakers by Klap and colleagues (1993) appears to differ from that of SD in English. This perceptual analysis aims to describe the phonetic features of French SD. A video of 6 French speakers with SD supplied by Klap and colleagues was analysed for frequency of phonatory breaks, pitch breaks, harshness, creak, breathiness and falsetto voice, rate of production, and quantity of speech output. In contrast to English SD, the French speaking SD patients demonstrated no evidence pitch breaks, but phonatory breaks, harshness and breathiness were prominent features. This verifies the French authors’ (1993) clinical description. These findings suggest that phonetic properties of a specific language may affect the manifestation of pathology in neurogenic voice disorders
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Speech intelligibility assessment plays an important role in the therapy of
patients suffering from pathological speech disorders. Automatic and objective
measures are desirable to assist therapists in their traditionally subjective
and labor-intensive assessments. In this work, we investigate a novel approach
for obtaining such a measure using the divergence in disentangled latent speech
representations of a parallel utterance pair, obtained from a healthy reference
and a pathological speaker. Experiments on an English database of Cerebral
Palsy patients, using all available utterances per speaker, show high and
significant correlation values (R = -0.9) with subjective intelligibility
measures, while having only minimal deviation (+-0.01) across four different
reference speaker pairs. We also demonstrate the robustness of the proposed
method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a
significantly smaller amount of utterances per speaker. Our results are among
the first to show that disentangled speech representations can be used for
automatic pathological speech intelligibility assessment, resulting in a
reference speaker pair invariant method, applicable in scenarios with only few
utterances available.Comment: Submitted to INTERSPEECH202
Recommended from our members
Efficacy of speech intervention using electropalatography with a cochlear implant user
Electropalatography (EPG) has become relatively well established as a safe and convenient technique for use in the assessment, diagnosis and treatment of children and adults with articulation disorders. EPG's wide applicability is reflected in the range of different cases that has been researched in recent years. Some research has been carried out using EPG therapy for deaf individuals who use hearing aids, however there are no similar studies for cochlear implant users. The purpose of this single case study is to explore the technique of EPG as a therapeutic intervention to treat voiceless velar stop consonant sound production in a deaf child cochlear implant user. EPG therapy was offered as a last resort when traditional therapy failed to achieve specific changes. During therapy, a list of familiar words was practised, using the visual feedback provided by EPG. The client's articulation was assessed using objective (EPG printouts) and subjective (listener ratings) measures at four assessment points. Changes were found to be statistically significant. Generalization of the newly‐acquired skills to untaught words containing voiceless velars was also observed. The results are discussed in the broader context of implications of this type of therapy with deaf clients
DIA : a tool for objective intelligibility assessment of pathological speech
Intelligibility is generally accepted to be a very relevant measure in the assessment of pathological speech. In clinical practice, intelligibility is measured using one of the many existing perceptual tests. These tests usually have the drawback that they employ unnatural speech material (e.g. nonsense words) and that they cannot fully exclude errors due to the listener's bias. This raises the need for an objective and automated tool to measure intelligibility. Here, we present the Dutch Intelligibility Assessment (DIA), an objective tool that aids the speech therapist in evaluating the intelligibility of persons with pathological speech. This tool will soon be made publicly available
- …