399 research outputs found

    Speeching: Mobile Crowdsourced Speech Assessment to Support Self-Monitoring and Management for People with Parkinson's

    Get PDF
    We present Speeching, a mobile application that uses crowdsourcing to support the self-monitoring and management of speech and voice issues for people with Parkinson's (PwP). The application allows participants to audio record short voice tasks, which are then rated and assessed by crowd workers. Speeching then feeds these results back to provide users with examples of how they were perceived by listeners unconnected to them (thus not used to their speech patterns). We conducted our study in two phases. First we assessed the feasibility of utilising the crowd to provide ratings of speech and voice that are comparable to those of experts. We then conducted a trial to evaluate how the provision of feedback, using Speeching, was valued by PwP. Our study highlights how applications like Speeching open up new opportunities for self-monitoring in digital health and wellbeing, and provide a means for those without regular access to clinical assessment services to practice-and get meaningful feedback on-their speech

    Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities

    Full text link
    Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.Comment: SIGIR '23. The 46th International ACM SIGIR Conference on Research & Development in Information Retrieva

    Design of text generator application with OpenAI GPT-3

    Get PDF
    The increasing need for text content creation today challenges the development of systems that can alleviate the need for text creation. Currently, text generation is done manually and has various shortcomings, especially in terms of time constraints, human error, limited creativity, and writing that tends to be repetitive by certain people, which can cause a decrease in quality and diversity in the sentences produced. This research was conducted by designing an AI-based text generator application using the GPT-3 language model to generate text automatically and help overcome some obstacles. Applying this app will increase efficiency and productivity, increase the writer's ideas and creativity, automate routine tasks, and produce exciting and communicative sentences. The app's ability to generate text quickly and accurately and be personalized makes it valuable in various fields. The method used in this research is implementing the GPT-3 language model APIs into the text generator application created so that the application can connect with the GPT-3 engine that has been modified in its prompting method. The output of this application is a text that has been adjusted to the user's needs through keywords entered on the web interface system. The result is that the text generator application is good enough to be implemented in various fields, especially text content generation.

    The Application of Echo State Networks to Atypical Speech Recognition

    Get PDF
    Automatic speech recognition (ASR) techniques have improved extensively over the past few years with the rise of new deep learning architectures. Recent sequence-to-sequence models have been shown to have high accuracy by utilizing the attention mechanism, which evaluates and learns the magnitude of element relationships in sequences. Despite being highly accurate, commercial ASR models have a weakness when it comes to accessibility. Current commercial deep learning ASR models find difficulty evaluating and transcribing speech for individuals with unique vocal features, such as those with dysarthria, heavy accents, as well as deaf and hard-of-hearing individuals. Current methodologies for processing vocal data revolve around convolutional feature extraction layers, dulling the sequential nature of the data. Alternatively, reservoir computing has gained popularity for the ability to translate input data to changing network states, which preserves the overall feature complexity of the input. Echo state networks (ESN), a type of reservoir computing mechanism employing a random recurrent neural network, have shown promise in a number of time series classification tasks. This work explores the integration of ESNs into deep learning ASR models. The Listen, Attend and Spell, and Transformer models were utilized as a baseline. A novel approach that used the echo state network as a feature extractor was explored and evaluated using the two models as baseline architectures. The models were trained on 960 hours of LibriSpeech audio data and tuned on various atypical speech data, including the Torgo dysarthric speech dataset and University of Memphis SPAL dataset. The ESN-based Echo, Listen, Attend, and Spell model produced more accurate transcriptions when evaluating on the LibriSpeech test set compared to the ESN-based Transformer. The baseline transformer model achieved a 43.4% word error rate on the Torgo test set after full network tuning. A prototype ASR system was developed to utilize both the developed model as well as commercial smart assistant language models. The system operates on a Raspberry Pi 4 using the Assistant Relay framework
    • …
    corecore