3 research outputs found
Automated Voice Pathology Discrimination from Continuous Speech Benefits from Analysis by Phonetic Context
In contrast to previous studies that look only at discriminating pathological voice from the normal voice, in this study we focus on the discrimination between cases of spasmodic dysphonia (SD) and vocal fold palsy (VP) using automated analysis of speech recordings. The hypothesis is that discrimination will be enhanced by studying continuous speech, since the different pathologies are likely to have different effects in different phonetic contexts. We collected audio recordings of isolated vowels and of a read passage from 60 patients diagnosed with SD (N=38) or VP (N=22). Baseline classifiers on features extracted from the recordings taken as a whole gave a cross-validated unweighted average recall of up to 75% for discriminating the two pathologies. We used an automated method to divide the read passage into phone-labelled regions and built classifiers for each phone. Results show that the discriminability of the pathologies varied with phonetic context as predicted. Since different phone contexts provide different information about the pathologies, classification is improved by fusing phone predictions, to achieve a classification accuracy of 83%. The work has implications for the differential diagnosis of voice pathologies and contributes to a better understanding of their impact on speech
A Voice Disease Detection Method Based on MFCCs and Shallow CNN
The incidence rate of voice diseases is increasing year by year. The use of
software for remote diagnosis is a technical development trend and has
important practical value. Among voice diseases, common diseases that cause
hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and
vocal cord polyp. This paper presents a voice disease detection method that can
be applied in a wide range of clinical. We cooperated with Xiangya Hospital of
Central South University to collect voice samples from sixty-one different
patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are
extracted as input features to describe the voice in the form of data. An
innovative model combining MFCC parameters and single convolution layer CNN is
proposed for fast calculation and classification. The highest accuracy we
achieved was 92%, it is fully ahead of the original research results and
internationally advanced. And we use Advanced Voice Function Assessment
Databases (AVFAD) to evaluate the generalization ability of the method we
proposed, which achieved an accuracy rate of 98%. Experiments on clinical and
standard datasets show that for the pathological detection of voice diseases,
our method has greatly improved in accuracy and computational efficiency
Automated voice pathology discrimination from audio recordings benefits from phonetic analysis of continuous speech
In this paper we evaluate the hypothesis that automated methods for diagnosis of voice disorders from speech recordings would benefit from contextual information found in continuous speech. Rather than basing a diagnosis on how disorders affect the average acoustic properties of the speech signal, the idea is to exploit the possibility that different disorders will cause different acoustic changes within different phonetic contexts. Any differences in the pattern of effects across contexts would then provide additional information for discrimination of pathologies. We evaluate this approach using two complementary studies: the first uses a short phrase which is automatically annotated using a phonetic transcription, the second uses a long reading passage which is automatically annotated from text. The first study uses a single sentence recorded from 597 speakers in the Saarbrucken Voice Database to discriminate structural from neurogenic disorders. The results show that discrimination performance for these broad pathology classes improves from 59% to 67% unweighted average recall when classifiers are trained for each phone-label and the results fused. Although the phonetic contexts improved discrimination, the overall sensitivity and specificity of the method seems insufficient for clinical application. We hypothesise that this is because of the limited contexts in the speech audio and the heterogeneous nature of the disorders. In the second study we address these issues by processing recordings of a long reading passage obtained from clinical recordings of 60 speakers with either Spasmodic Dysphonia or Vocal fold Paralysis. We show that discrimination performance increases from 80% to 87% unweighted average recall if classifiers are trained for each phone-labelled region and predictions fused. We also show that the sensitivity and specificity of a diagnostic test with this performance is similar to other diagnostic procedures in clinical use. In conclusion, the studies confirm that the exploitation of contextual differences in the way disorders affect speech improves automated diagnostic performance, and that automated methods for phonetic annotation of reading passages are robust enough to extract useful diagnostic information