3 research outputs found

    "Hey Siri, do you understand me?": Virtual Assistants and Dysarthria

    Get PDF
    Voice-activated devices are becoming common place: people can use their voice to control smartphones, smart vacuum robots, and interact with their smart homes through virtual assistant devices like Amazon Echo or Google Home. The spread of such voice-controlled devices is possible thanks to the increasing capabilities of natural language processing, and generally have a positive impact on the device accessibility, e.g., for people with disabilities. However, a consequence of these devices embracing voice control is that people with dysarthria or other speech impairments may be unable to control their intelligent environments, at least with proficiency. This paper investigates to which extent people with dysarthria can use and be understood by the three most common virtual assistants, namely Siri, Google Assistant, and Amazon Alexa. Starting from the sentences in the TORGO database of dysarthric articulation, the differences between such assistants are investigated and discussed. Preliminary results show that the three virtual assistants have comparable performance, with an accuracy of the recognition in the range of 50-60%

    Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech

    Get PDF
    The usage of smartphone-based virtual assistants (e.g., Siri or Google Assistant) is growing, and their spread was most possible by the increasing capabilities of natural language processing, and generally has a positive impact on device accessibility, e.g., for people with disabilities. However, people with dysarthria or other speech impairments may be unable to use these virtual assistants with proficiency. This paper investigates to which extent people with ALS-induced dysarthria can be understood and get consistent answers by three widely used smartphone-based assistants, namely Siri, Google Assistant, and Cortana. In particular, we focus on the recognition of Italian dysarthric speech, to study the behavior of the virtual assistants with this specific population for which there are no relevant studies available. We collected and recorded suitable speech samples from people with dysarthria in a dedicated center of the Molinette hospital, in Turin, Italy. Starting from those recordings, the differences between such assistants, in terms of speech recognition and consistency in answer, are investigated and discussed. Results highlight different performance among the virtual assistants. For speech recognition, Google Assistant is the most promising, with around 25% of word error rate per sentence. Consistency in answer, instead, sees Siri and Google Assistant provide coherent answers around 60% of times

    Dysarthric speech analysis and automatic recognition using phase based representations

    Get PDF
    Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance. The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility. A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria. In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech
    corecore