Search CORE

967 research outputs found

Automatic Speech Recognition Errors Detection Using Supervised Learning Techniques

Author: El Hannani A.
Errattahi R.
Hain T.
Ouahmane H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/06/2016
Field of study

Over the last years, many advances have been made in the field of Automatic Speech Recognition (ASR). However, the persistent presence of ASR errors is limiting the widespread adoption of speech technology in real life applications. This motivates the attempts to find alternative techniques to automatically detect and correct ASR errors, which can be very effective and especially when the user does not have access to tune the features, the models or the decoder of the ASR system or when the transcription serves as input to downstream systems like machine translation, information retrieval, and question answering. In this paper, we present an ASR errors detection system targeted towards substitution and insertion errors. The proposed system is based on supervised learning techniques and uses input features deducted only from the ASR output words and hence should be usable with any ASR system. Applying this system on TV program transcription data leads to identify 40.30% of the recognition errors generated by the ASR system

White Rose Research Online

Fast Keyword Spotting in Telephone Speech

Author: Nouza J.
Silovsky J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2009
Field of study

In the paper, we present a system designed for detecting keywords in telephone speech. We focus not only on achieving high accuracy but also on very short processing time. The keyword spotting system can run in three modes: a) an off-line mode requiring less than 0.1xRT, b) an on-line mode with minimum (2 s) latency, and c) a repeated spotting mode, in which pre-computed values allow for additional acceleration. Its performance is evaluated on recordings of Czech spontaneous telephone speech using rather large and complex keyword lists

Directory of Open Access Journals

DSpace@TUL

Digital library of Brno University of Technology

Recommended from our members

uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support

Author: Carstens Deborah
Converse Patrick D
Fiore Stephen M
Gurbuz Sabri
Kepuska Veton Z
Metcalf David
Rodriguez Walter
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2008
Field of study

A human-centered computing platform that improves teamwork and transforms the “human- computer interaction experience” for distributed teams is presented. This Ubiquitous Collaboration, or uC (“you see”), platform\u27s objective is to transform distributed teamwork (i.e., work occurring when teams of workers and learners are geographically dispersed and often interacting at different times). It achieves this goal through a multimodal team interaction interface realized through a reconfigurable open architecture. The approach taken is to integrate: (1) an intuitive speech- and video-centric multi-modal interface to augment more conventional methods (e.g., mouse, stylus and touch), (2) an open and reconfigurable architecture supporting information gathering, and (3) a machine intelligent approach to analysis and management of heterogeneous live and stored sensor data to support collaboration. The system will transform how teams of people interact with computers by drawing on both the virtual and physical environment

CSUSB ScholarWorks

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

Author: Nagasaka Shogo
Nakashima Ryo
Taniguchi Tadahiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/03/2016
Field of study

Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on Autonomous Mental Development (TAMD

arXiv.org e-Print Archive

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

The radiological report:a compromise between structured reporting and natural language processing

Author: Nobel Johan Martijn
Publication venue: Maastricht University
Publication date: 01/01/2023
Field of study

Maastricht University Research Portal

ASR Error Detection via Audio-Transcript entailment

Author: Konam Sandeep
Meripo Nimshi Venkat
Publication venue
Publication date: 21/07/2022
Field of study

Despite improved performances of the latest Automatic Speech Recognition (ASR) systems, transcription errors are still unavoidable. These errors can have a considerable impact in critical domains such as healthcare, when used to help with clinical documentation. Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. To the best of our knowledge, we are the first to frame this problem as an end-to-end entailment task between the audio segment and its corresponding transcript segment. Our intuition is that there should be a bidirectional entailment between audio and transcript when there is no recognition error and vice versa. The proposed model utilizes an acoustic encoder and a linguistic encoder to model the speech and transcript respectively. The encoded representations of both modalities are fused to predict the entailment. Since doctor-patient conversations are used in our experiments, a particular emphasis is placed on medical terms. Our proposed model achieves classification error rates (CER) of 26.2% on all transcription errors and 23% on medical errors specifically, leading to improvements upon a strong baseline by 12% and 15.4%, respectively.Comment: Accepted to Interspeech 202

arXiv.org e-Print Archive

Wake-Up-Word Speech Recognition

Author: Këpuska Veton
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen