Search CORE

399 research outputs found

Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers

Author: Bell Peter
Jones Matt
Kalarikalayil Raju Dani
Klejch Ondrej
Pearson Jennifer
Reitmaier Thomas
Robinson Simon
Wallington Electra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2022
Field of study

Edinburgh Research Explorer

Computer audition for fighting the SARS-CoV-2 corona crisis: introducing the multitask speech corpus for COVID-19

Author: Duan Junjun
Han Jing
Ji Wei
Koike Tomoya
Liu Juan
Liu Shuo
Qian Kun
Ren Zhao
Schmitt Maximilian
Schuller Björn W.
Song Meishu
Yamamoto Yoshiharu
Yang Zijiang
Zhang Zixing
Zheng Huaiyuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

OPUS Augsburg

Speeching: Mobile Crowdsourced Speech Assessment to Support Self-Monitoring and Management for People with Parkinson's

Author: Audhkhasi Kartik
Bigham Jeffrey P.
Cavender Anna C.
Côté Nicolas
Evanini Keelan
Goto Masataka
McGraw Ian
Miller Nick
Parent Gabriel
Putnam R
Swan M
Wicks Paul
Wolters Maria K.
Ziegler Wolfram
Publication venue
Publication date: 01/01/2016
Field of study

We present Speeching, a mobile application that uses crowdsourcing to support the self-monitoring and management of speech and voice issues for people with Parkinson's (PwP). The application allows participants to audio record short voice tasks, which are then rated and assessed by crowd workers. Speeching then feeds these results back to provide users with examples of how they were perceived by listeners unconnected to them (thus not used to their speech patterns). We conducted our study in two phases. First we assessed the feasibility of utilising the crowd to provide ratings of speech and voice that are comparable to those of experts. We then conducted a trial to evaluate how the provision of feedback, using Speeching, was valued by PwP. Our study highlights how applications like Speeching open up new opportunities for self-monitoring in digital health and wellbeing, and provide a means for those without regular access to clinical assessment services to practice-and get meaningful feedback on-their speech

Northumbria Research Link

Crossref

Edinburgh Research Explorer

Lancaster E-Prints

Explore Bristol Research

Recommended from our members

Empowering Expression for Users with Aphasia through Constrained Creativity

Author: Marshall J.
Neate T.
Roper A.
Wilson S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/05/2019
Field of study

Creative activities allow people to express themselves in rich, nuanced ways. However, being creative does not always come easily. For example, people with speech and language impairments, such as aphasia, face challenges in creative activities that involve language. In this paper, we explore the concept of constrained creativity as a way of addressing this challenge and enabling creative writing. We report an app, MakeWrite, that supports the constrained creation of digital texts through automated redaction. The app was co-designed with and for people with aphasia and was subsequently explored in a workshop with a group of people with aphasia. Participants were not only successful in crafting novel language, but, importantly, self-reported that the app was crucial in enabling them to do so. We refect on the potential of technology-supported constrained creativity as a means of empowering expression amongst users with diverse needs

City Research Online

Crossref

Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities

Author: Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/04/2023
Field of study

Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.Comment: SIGIR '23. The 46th International ACM SIGIR Conference on Research & Development in Information Retrieva

arXiv.org e-Print Archive

Design of text generator application with OpenAI GPT-3

Author: Milani Fitria Kaira
Publication venue: Universitas Nurul Jadid
Publication date: 01/10/2023
Field of study

The increasing need for text content creation today challenges the development of systems that can alleviate the need for text creation. Currently, text generation is done manually and has various shortcomings, especially in terms of time constraints, human error, limited creativity, and writing that tends to be repetitive by certain people, which can cause a decrease in quality and diversity in the sentences produced. This research was conducted by designing an AI-based text generator application using the GPT-3 language model to generate text automatically and help overcome some obstacles. Applying this app will increase efficiency and productivity, increase the writer's ideas and creativity, automate routine tasks, and produce exciting and communicative sentences. The app's ability to generate text quickly and accurately and be personalized makes it valuable in various fields. The method used in this research is implementing the GPT-3 language model APIs into the text generator application created so that the application can connect with the GPT-3 engine that has been modified in its prompting method. The output of this application is a text that has been adjusted to the user's needs through keywords entered on the web interface system. The result is that the text generator application is good enough to be implemented in various fields, especially text content generation.

Directory of Open Access Journals

E-Journal UNUJA (Universitas Nurul Jadid)

The Application of Echo State Networks to Atypical Speech Recognition

Author: Adams Daniel W
Publication venue: RIT Scholar Works
Publication date: 01/05/2021
Field of study

Automatic speech recognition (ASR) techniques have improved extensively over the past few years with the rise of new deep learning architectures. Recent sequence-to-sequence models have been shown to have high accuracy by utilizing the attention mechanism, which evaluates and learns the magnitude of element relationships in sequences. Despite being highly accurate, commercial ASR models have a weakness when it comes to accessibility. Current commercial deep learning ASR models find difficulty evaluating and transcribing speech for individuals with unique vocal features, such as those with dysarthria, heavy accents, as well as deaf and hard-of-hearing individuals. Current methodologies for processing vocal data revolve around convolutional feature extraction layers, dulling the sequential nature of the data. Alternatively, reservoir computing has gained popularity for the ability to translate input data to changing network states, which preserves the overall feature complexity of the input. Echo state networks (ESN), a type of reservoir computing mechanism employing a random recurrent neural network, have shown promise in a number of time series classification tasks. This work explores the integration of ESNs into deep learning ASR models. The Listen, Attend and Spell, and Transformer models were utilized as a baseline. A novel approach that used the echo state network as a feature extractor was explored and evaluated using the two models as baseline architectures. The models were trained on 960 hours of LibriSpeech audio data and tuned on various atypical speech data, including the Torgo dysarthric speech dataset and University of Memphis SPAL dataset. The ESN-based Echo, Listen, Attend, and Spell model produced more accurate transcriptions when evaluating on the LibriSpeech test set compared to the ESN-based Transformer. The baseline transformer model achieved a 43.4% word error rate on the Torgo test set after full network tuning. A prototype ASR system was developed to utilize both the developed model as well as commercial smart assistant language models. The system operates on a Raspberry Pi 4 using the Assistant Relay framework

RIT Scholar Works

Crowd-based assessment of communication limitations in dysarthria – development and evaluation of the KommPaS web app

Author: Lehner Katharina Maria
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 15/11/2021
Field of study

Digitale Hochschulschriften der LMU