7 research outputs found

    Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech

    Get PDF
    The usage of smartphone-based virtual assistants (e.g., Siri or Google Assistant) is growing, and their spread was most possible by the increasing capabilities of natural language processing, and generally has a positive impact on device accessibility, e.g., for people with disabilities. However, people with dysarthria or other speech impairments may be unable to use these virtual assistants with proficiency. This paper investigates to which extent people with ALS-induced dysarthria can be understood and get consistent answers by three widely used smartphone-based assistants, namely Siri, Google Assistant, and Cortana. In particular, we focus on the recognition of Italian dysarthric speech, to study the behavior of the virtual assistants with this specific population for which there are no relevant studies available. We collected and recorded suitable speech samples from people with dysarthria in a dedicated center of the Molinette hospital, in Turin, Italy. Starting from those recordings, the differences between such assistants, in terms of speech recognition and consistency in answer, are investigated and discussed. Results highlight different performance among the virtual assistants. For speech recognition, Google Assistant is the most promising, with around 25% of word error rate per sentence. Consistency in answer, instead, sees Siri and Google Assistant provide coherent answers around 60% of times

    On the Impact of Dysarthric Speech on Contemporary ASR Cloud Platforms

    Get PDF
    The spread of voice-driven devices has a positive impact for people with disabilities in smart environments, since such devices allow them to perform a series of daily activities that were difficult or impossible before. As a result, their quality of life and autonomy increase. However, the speech recognition technology employed in such devices becomes limited with people having communication disorders, like dysarthria. People with dysarthria may be unable to control their smart environments, at least with the needed proficiency; this problem may negatively affect the perceived reliability of the entire environment. By exploiting the TORGO database of speech samples pronounced by people with dysarthria, this paper compares the accuracy of the dysarthric speech recognition as achieved by three speech recognition cloud platforms, namely IBM Watson Speech-to- Text, Google Cloud Speech, and Microsoft Azure Bing Speech. Such services, indeed, are used in many virtual assistants deployed in smart environments, such as Google Home. The goal is to investigate whether such cloud platforms are usable to recognize dysarthric speech, and to understand which of them is the most suitable for people with dysarthria. Results suggest that the three platforms have comparable performance in recognizing dysarthric speech, and that the accuracy of the recognition is related to the speech intelligibility of the person. Overall, the platforms are limited when the dysarthric speech intelligibility is low (80-90% of word error rate), while they improve up to reach a word error rate of 15-25% for people without abnormality in their speech intelligibility

    Interacting with smart virtual assistants for individuals with dysarthria: a comparative study on usability and user preferences

    Get PDF
    This study explores the effectiveness and user experience of different interaction methods used by individuals with dysarthria when engaging with Smart Virtual Assistants (SVAs). It focuses on three primary modalities: direct speech commands through Alexa, non-verbal voice cues via the Daria system, and eye gaze control. The objective is to assess the usability, workload, and user preferences associated with each method, catering to the varying communication capabilities of individuals with dysarthria. While Alexa and Daria facilitate voice-based interactions, eye gaze control offers an alternative for those unable to use voice commands, including users with severe dysarthria. This comparative approach aims to determine how the usability of each interaction method varies, conducted with eight participants with dysarthria. The results indicated that non-verbal voice interactions, particularly with the Daria system, were favored because of their lower workload and ease of use. The eye gaze technology, while viable, presented challenges in terms of the higher workload and usability. These findings highlight the necessity of diversifying interaction methods with SVAs to accommodate the unique needs of individuals with dysarthria

    The use of speech recognition technology by people living with Amyotrophic Lateral Sclerosis: a scoping review

    Get PDF
    More than 80% of people living with Amyotrophic Lateral Sclerosis (plwALS) develop difficulties with their speech, affecting communication, self-identity and quality of life. Automatic speech recognition technology (ASR) is becoming a common way to interact with a broad range of devices, to find information and control the environment. ASR can be problematic for people with acquired neurogenic motor speech difficulties (dysarthria). Given that the field is rapidly developing, a scoping review is warranted

    Estudio de asistentes virtuales en el entorno de la Diversidad funcional intelectual

    Full text link
    M谩ster en Investigaci贸n e Innovaci贸n en TICLa diversidad funcional intelectual se refiere a limitaciones cognitivas que influyen en el funcionamiento mental y comportamiento adaptativo de una persona. Quienes est谩n considerados dentro de este entorno presentan dificultades en su capacidad de socializar, memorizar, tolerancia reducida a equivocaciones y problemas de comunicaci贸n debido a un limitado desarrollo del lenguaje. Este 煤ltimo, genera una barrera con lo cual puede causar un r谩pido abandono en tareas en donde es necesario la constancia y repetici贸n. Con la aparici贸n de asistentes de voz los cuales son agentes de software dotados de inteligencia artificial, en donde la principal forma de interacci贸n con los mismos es mediante el uso de la voz, han llevado a la aparici贸n de nuevos dispositivos como es el caso de pantallas y altavoces inteligentes. Marcas como Amazon, Apple, Google, Samsung y Microsoft han lanzado altavoces los cuales contienen a sus asistentes de voz propios. Dichos dispositivos permiten realizar tareas que usualmente se realizan haciendo uso de interfaces gr谩ficas, para lo cual hacen uso de una palabra con lo cual comienzan a percibir peticiones. El presente estudio se centra en el uso de asistentes de voz comerciales sobre altavoces inteligentes siendo usados por personas con diversidad funcional intelectual. Con lo cual se realizaron sesiones con un total de 26 participantes divididos en 3 grupos homog茅neos, los cuales realizaron diversos tipos de tareas con los asistentes de voz sobre altavoces inteligentes. Al concluir el estudio y analizar los datos obtenidos se evidencio las limitaciones que presentan al ser usados por personas con diversidad funcional intelectual puesto que no han sido valoradas las condiciones de su entorno. En adici贸n, se proponen unas gu铆as de dise帽o las cuales buscan eliminar las barreras con las que se han encontrado los participantes dentro del estudio

    Apraxia World: Deploying a Mobile Game and Automatic Speech Recognition for Independent Child Speech Therapy

    Get PDF
    Children with speech sound disorders typically improve pronunciation quality by undergoing speech therapy, which must be delivered frequently and with high intensity to be effective. As such, clinic sessions are supplemented with home practice, often under caregiver supervision. However, traditional home practice can grow boring for children due to monotony. Furthermore, practice frequency is limited by caregiver availability, making it difficult for some children to reach therapy dosage. To address these issues, this dissertation presents a novel speech therapy game to increase engagement, and explores automatic pronunciation evaluation techniques to afford children independent practice. Children with speech sound disorders typically improve pronunciation quality by undergoing speech therapy, which must be delivered frequently and with high intensity to be effective. As such, clinic sessions are supplemented with home practice, often under caregiver supervision. However, traditional home practice can grow boring for children due to monotony. Furthermore, practice frequency is limited by caregiver availability, making it difficult for some children to reach therapy dosage. To address these issues, this dissertation presents a novel speech therapy game to increase engagement, and explores automatic pronunciation evaluation techniques to afford children independent practice. The therapy game, called Apraxia World, delivers customizable, repetition-based speech therapy while children play through platformer-style levels using typical on-screen tablet controls; children complete in-game speech exercises to collect assets required to progress through the levels. Additionally, Apraxia World provides pronunciation feedback according to an automated pronunciation evaluation system running locally on the tablet. Apraxia World offers two advantages over current commercial and research speech therapy games; first, the game provides extended gameplay to support long therapy treatments; second, it affords some therapy practice independence via automatic pronunciation evaluation, allowing caregivers to lightly supervise instead of directly administer the practice. Pilot testing indicated that children enjoyed the game-based therapy much more than traditional practice and that the exercises did not interfere with gameplay. During a longitudinal study, children made clinically-significant pronunciation improvements while playing Apraxia World at home. Furthermore, children remained engaged in the game-based therapy over the two-month testing period and some even wanted to continue playing post-study. The second part of the dissertation explores word- and phoneme-level pronunciation verification for child speech therapy applications. Word-level pronunciation verification is accomplished using a child-specific template-matching framework, where an utterance is compared against correctly and incorrectly pronounced examples of the word. This framework identified mispronounced words better than both a standard automated baseline and co-located caregivers. Phoneme-level mispronunciation detection is investigated using a technique from the second-language learning literature: training phoneme-specific classifiers with phonetic posterior features. This method also outperformed the standard baseline, but more significantly, identified mispronunciations better than student clinicians
    corecore