17,372 research outputs found

    Towards Personalized Synthesized Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction

    Get PDF
    When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neurone disease (MND) or Parkinson’s, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as voice banking. But, for some patients, the speech deterioration frequently coincides or quickly follows diagnosis. Using HMM-based speech synthesis, it is now possible to build personalized synthetic voices with minimal data recordings and even disordered speech. The power of this approach is that it is possible to use the patient’s recordings to adapt existing voice models pre-trained on many speakers. When the speech has begun to deteriorate, the adapted voice model can be further modified in order to compensate for the disordered characteristics found in the patient’s speech. The University of Edinburgh has initiated a project for voice banking and reconstruction based on this speech synthesis technology. At the current stage of the project, more than fifteen patients with MND have already been recorded and five of them have been delivered a reconstructed voice. In this paper, we present an overview of the project as well as subjective assessments of the reconstructed voices and feedback from patients and their families

    Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection

    Get PDF
    Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased "breathiness". Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness.

Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a "hoarseness" diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices.

Results: On a large database of subjects with a wide variety of voice disorders, these new techniques can distinguish normal from disordered cases, using quadratic discriminant analysis, to overall correct classification performance of 91.8% plus or minus 2.0%. The true positive classification performance is 95.4% plus or minus 3.2%, and the true negative performance is 91.5% plus or minus 2.3% (95% confidence). This is shown to outperform all combinations of the most popular classical tools.

Conclusions: Given the very large number of arbitrary parameters and computational complexity of existing techniques, these new techniques are far simpler and yet achieve clinically useful classification performance using only a basic classification technique. They do so by exploiting the inherent nonlinearity and turbulent randomness in disordered voice signals. They are widely applicable to the whole range of disordered voice phenomena by design. These new measures could therefore be used for a variety of practical clinical purposes.

    Simulating dysarthric speech for training data augmentation in clinical speech applications

    Full text link
    Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation.Comment: Will appear in Proc. of ICASSP 201

    Exploring auditory-motor interactions in normal and disordered speech

    Full text link
    Auditory feedback plays an important role in speech motor learning and in the online correction of speech movements. Speakers can detect and correct auditory feedback errors at the segmental and suprasegmental levels during ongoing speech. The frontal brain regions that contribute to these corrective movements have also been shown to be more active during speech in persons who stutter (PWS) compared to fluent speakers. Further, various types of altered auditory feedback can temporarily improve the fluency of PWS, suggesting that atypical auditory-motor interactions during speech may contribute to stuttering disfluencies. To investigate this possibility, we have developed and improved Audapter, a software that enables configurable dynamic perturbation of the spatial and temporal content of the speech auditory signal in real time. Using Audapter, we have measured the compensatory responses of PWS to static and dynamic perturbations of the formant content of auditory feedback and compared these responses with those from matched fluent controls. Our findings indicate deficient utilization of auditory feedback by PWS for short-latency online control of the spatial and temporal parameters of articulation during vowel production and during running speech. These findings provide further evidence that stuttering is associated with aberrant auditory-motor integration during speech.Published versio

    Brittany Bernal - Sensorimotor Adaptation of Vowel Production in Stop Consonant Contexts

    Get PDF
    The purpose of this research is to measure the compensatory and adaptive articulatory response to shifted formants in auditory feedback to compare the resulting amount of sensorimotor learning that takes place in speakers upon saying the words /pep/ and /tet/. These words were chosen in order to analyze the coarticulatory effects of voiceless consonants /p/ and /t/ on sensorimotor adaptation of the vowel /e/. The formant perturbations were done using the Audapt software, which takes an input speech sample and plays it back to the speaker in real-time via headphones. Formants are high-energy acoustic resonance patterns measured in hertz that reflect positions of articulators during the production of speech sounds. The two lowest frequency formants (F1 and F2) can uniquely distinguish among the vowels of American English. For this experiment, Audapt shifted F1 down and F2 up, and those who adapt were expected to shift in the opposite direction of the perturbation. The formant patterns and vowel boundaries were analyzed using TF32 and S+ software, which led to conclusions about the adaptive responses. Manipulating auditory feedback by shifting formant values is hypothesized to elicit sensorimotor adaptation, a form of short-term motor learning. The amount of adaptation is expected to be greater for the word /pep/ rather than /tet/ because there is less competition for articulatory placement of the tongue during production of bilabial consonants. This methodology could be further developed to help those with motor speech disorders remedy their speech errors with much less conscious effort than traditional therapy techniques.https://epublications.marquette.edu/mcnair_2013/1008/thumbnail.jp

    Analysis of Vocal Disorders in a Feature Space

    Full text link
    This paper provides a way to classify vocal disorders for clinical applications. This goal is achieved by means of geometric signal separation in a feature space. Typical quantities from chaos theory (like entropy, correlation dimension and first lyapunov exponent) and some conventional ones (like autocorrelation and spectral factor) are analysed and evaluated, in order to provide entries for the feature vectors. A way of quantifying the amount of disorder is proposed by means of an healthy index that measures the distance of a voice sample from the centre of mass of both healthy and sick clusters in the feature space. A successful application of the geometrical signal separation is reported, concerning distinction between normal and disordered phonation.Comment: 12 pages, 3 figures, accepted for publication in Medical Engineering & Physic

    Historical Analyses of Disordered Handwriting

    Get PDF
    Handwritten texts carry significant information, extending beyond the meaning of their words. Modern neurology, for example, benefits from the interpretation of the graphic features of writing and drawing for the diagnosis and monitoring of diseases and disorders. This article examines how handwriting analysis can be used, and has been used historically, as a methodological tool for the assessment of medical conditions and how this enhances our understanding of historical contexts of writing. We analyze handwritten material, writing tests and letters, from patients in an early 20th-century psychiatric hospital in southern Germany (Irsee/Kaufbeuren). In this institution, early psychiatrists assessed handwriting features, providing us novel insights into the earliest practices of psychiatric handwriting analysis, which can be connected to Berkenkotter’s research on medical admission records. We finally consider the degree to which historical handwriting bears semiotic potential to explain the psychological state and personality of a writer, and how future research in written communication should approach these sources