Search CORE

10,545 research outputs found

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Author: Geng Mengzhe
Hu Shujie
Jin Zengrui
Li Guinan
Liu Xunying
Wang Huimeng
Wang Tianzi
Xu Haoning
Publication venue
Publication date: 31/12/2023
Field of study

Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained ASR models to limited dysarthric speech via data-intensive parameter fine-tuning leads to poor generalization. To this end, this paper presents an extensive comparative study of various data augmentation approaches to improve the robustness of pre-trained ASR model fine-tuning to dysarthric speech. These include: a) conventional speaker-independent perturbation of impaired speech; b) speaker-dependent speed perturbation, or GAN-based adversarial perturbation of normal, control speech based on their time alignment against parallel dysarthric speech; c) novel Spectral basis GAN-based adversarial data augmentation operating on non-parallel data. Experiments conducted on the UASpeech corpus suggest GAN-based data augmentation consistently outperforms fine-tuned Wav2vec2.0 and HuBERT models using no data augmentation and speed perturbation across different data expansion operating points by statistically significant word error rate (WER) reductions up to 2.01% and 0.96% absolute (9.03% and 4.63% relative) respectively on the UASpeech test set of 16 dysarthric speakers. After cross-system outputs rescoring, the best system produced the lowest published WER of 16.53% (46.47% on very low intelligibility) on UASpeech.Comment: To appear at IEEE ICASSP 202

arXiv.org e-Print Archive

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

Automatic Transcription of Northern Prinmi Oral Art: Approaches and Challenges to Automatic Speech Recognition for Language Documentation

Author: Bechler Connor
Publication venue: UKnowledge
Publication date: 01/01/2023
Field of study

One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Ćavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural network—especially transformer-based—approaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying XLSR-53 to Northern Prinmi, a Tibeto-Burman Qiangic language spoken in Southwest China (Daudey & Pincuo, 2020). Specifically, this thesis aims to answer two questions. First, is the XLSR-53 ASR model useful for first-pass transcription of oral art recordings from Northern Prinmi, an under-resourced tonal language? Second, does preprocessing target transcripts to combine grapheme clusters—multi-character representations of lexical tones and characters with modifying diacritics—into more phonologically salient units improve the model\u27s predictions? Results indicate that—with substantial adaptations—XLSR-53 will be useful for this task, and that preprocessing to combine grapheme clusters does improve model performance

University of Kentucky

Follow-up question handling in the IMIX and Ritel systems: A comparative study

Author: A. Max
B. W. Van Schooten
Den Akker
G. Illouz
O. Gal Ibert
R. Op
S. Rosset
Publication venue: Cambridge University Press
Publication date: 01/01/2009
Field of study

One of the basic topics of question answering (QA) dialogue systems is how follow-up questions should be interpreted by a QA system. In this paper, we shall discuss our experience with the IMIX and Ritel systems, for both of which a follow-up question handling scheme has been developed, and corpora have been collected. These two systems are each other's opposites in many respects: IMIX is multimodal, non-factoid, black-box QA, while Ritel is speech, factoid, keyword-based QA. Nevertheless, we will show that they are quite comparable, and that it is fruitful to examine the similarities and differences. We shall look at how the systems are composed, and how real, non-expert, users interact with the systems. We shall also provide comparisons with systems from the literature where possible, and indicate where open issues lie and in what areas existing systems may be improved. We conclude that most systems have a common architecture with a set of common subtasks, in particular detecting follow-up questions and finding referents for them. We characterise these tasks using the typical techniques used for performing them, and data from our corpora. We also identify a special type of follow-up question, the discourse question, which is asked when the user is trying to understand an answer, and propose some basic methods for handling it

CiteSeerX

University of Twente Research Information

The Lowlands team at TRECVID 2007

Author: Aly Robin
Hauff Claudia
Heeren Willemijn
Hiemstra Djoerd
Jong Franciska de
Ordelman Roeland
Verschoor Thijs
Vries Arjen de
Publication venue: NIST
Publication date: 01/01/2008
Field of study

In this report we summarize our methods and results for the search tasks in\ud TRECVID 2007. We employ two different kinds of search: purely ASR based and\ud purely concept based search. However, there is not significant difference of the\ud performance of the two systems. Using neighboring shots for the combination of\ud two concepts seems to be beneficial. General preprocessing of queries increased\ud the performance and choosing detector sources helped. However, for all automatic\ud search components we need to perform further investigations

Radboud Repository

University of Twente Research Information

Integrated urban water management in Texas: a review to inform a one water approach for the future

Author: Ashmore Jacqueline
Cherne-Hendrick Margaret
Marttin Victor
Publication venue: Boston University Institute for Sustainable Energy
Publication date: 01/04/2018
Field of study

Texas has considerable experience grappling with historic droughts as well as flooding associated with tropical storms and hurricanes, yet the State’s water management challenges are projected to increase. Urban densification, increased frequency and severity of droughts and floods, aging infrastructure, and a management system that is not reflective of the true cost of water all influence water risk. Integrated urban water management strategies, like ‘One Water’, represent an emerging management paradigm that emphasizes the interconnectedness of water throughout the water cycle and capitalizes on opportunities that arise from this holistic viewpoint. Here, we review water management practices in five Texas cities and examine how the One Water approach could represent a viable framework to maintain a reliable, sustainable, and affordable water supply for the future. We also examine financial and business models that establish a foundational pathway towards the ‘utility of the future’ and the One Water paradigm more broadly

Boston University Institutional Repository (OpenBU)

Piggybacking on an Autonomous Hauler: Business Models Enabling a System-of-Systems Approach to Mapping an Underground Mine

Author: Borg Markus
Olsson Thomas
Svensson John
Publication venue
Publication date: 01/01/2017
Field of study

With ever-increasing productivity targets in mining operations, there is a growing interest in mining automation. In future mines, remote-controlled and autonomous haulers will operate underground guided by LiDAR sensors. We envision reusing LiDAR measurements to maintain accurate mine maps that would contribute to both safety and productivity. Extrapolating from a pilot project on reliable wireless communication in Boliden's Kankberg mine, we propose establishing a system-of-systems (SoS) with LIDAR-equipped haulers and existing mapping solutions as constituent systems. SoS requirements engineering inevitably adds a political layer, as independent actors are stakeholders both on the system and SoS levels. We present four SoS scenarios representing different business models, discussing how development and operations could be distributed among Boliden and external stakeholders, e.g., the vehicle suppliers, the hauling company, and the developers of the mapping software. Based on eight key variation points, we compare the four scenarios from both technical and business perspectives. Finally, we validate our findings in a seminar with participants from the relevant stakeholders. We conclude that to determine which scenario is the most promising for Boliden, trade-offs regarding control, costs, risks, and innovation must be carefully evaluated.Comment: Preprint of industry track paper accepted for the 25th IEEE International Conference on Requirements Engineering (RE'17

arXiv.org e-Print Archive

Crossref

Swedish Institute of Computer Science Publications Database

Vocabulary size influences spontaneous speech in native language users: Validating the use of automatic speech recognition in individual differences research

Author: Hintz F.
Jongman S.
Khoe Y.
Publication venue: 'SAGE Publications'
Publication date: 30/03/2020
Field of study

Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to every-day questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: Individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription

MPG.PuRe