Search CORE

74,518 research outputs found

Minimizing word error rate in a dyslexic reading-oriented ASR engine using phoneme refinement and alternative pronunciation

Author: Husni Husniza
Jamaludin Zulikha
Publication venue: IATED publication
Publication date: 01/01/2010
Field of study

Little attention has been given to detecting miscues in the text space read by dyslexic children over an automatic speech recognition (ASR) engine. In an ASR system, the miscues are represented by word error rate (WER) and miscue detection rate (MDR). At all time, WER must be kept low, and MDR high so as to achieve better recognition. This paper focus on minimizing word error rate by formulating a better model for perspicuous representation of input data. Such representation takes into account phoneme refinement and alternative pronunciation for a particular Bahasa Melayu (BM) speech data uttered by dyslexic children. Based on literature, a few other optimal models of input data and their recognition results were compared. It is found that phoneme refinement and alternative pronunciation produced better recognition results as evidenced in the performance metrics --lower WER and higher MDR-- which are 25% and 80.77% respectively

UUM Repository

Simultaneous Multispeaker Segmentation for Automatic Meeting Recognition

Author: Fügen Christian
Laskowski Kornel
Schultz Tanja
Publication venue
Publication date: 03/09/2007
Field of study

Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, and standard vocal activity detection algorithms for close-talk microphones have shown to be ineffective. This is primarily due to the problem of crosstalk, in which a participant’s speech appears on other participants ’ microphones, making it hard to attribute detected speech to its correct speaker. We describe an automatic multichannel segmentation system for meeting recognition, which accounts for both the observed acoustics and the inferred vocal activity states of all participants using joint multi-participant models. Our experiments show that this approach almost completely eliminates the crosstalk problem. Recent improvements to the baseline reduce the development set word error rate, achieved by a state-of-theart multi-pass speech recognition system, by 62 % relative to manual segmentation. We also observe significant performance improvements on unseen data

CiteSeerX

KITopen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

System-independent ASR error detection and classification using Recurrent Neural Network

Author: Amaral
Asmaa EL Hannani
Bell
Deena
Errattahi
Errattahi
Fayolle
Fong
Gibson
Hassan Ouahmane
Jiang
Kemp
Korenevsky
Levin
Mangu
Nair
Nguyen
Ogawa
Pellegrini
Rahhal Errattahi
Rahim
Rudnicky
Rueber
Saz
Seigel
Sukkar
Thomas Hain
Wessel
Wessel
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2019
Field of study

This paper addresses errors in continuous Automatic Speech Recognition (ASR) in two stages: error detection and error type classification. Unlike the majority of research in this field, we propose to handle the recognition errors independently from the ASR decoder. We first establish an effective set of generic features derived exclusively from the recognizer output to compensate for the absence of ASR decoder information. Then, we apply a variant Recurrent Neural Network (V-RNN) based models for error detection and error type classification. Such model learn additional information to the recognized word classification using label dependency. As a result, experiments on Multi-Genre Broadcast Media corpus have shown that the proposed generic features setup leads to achieve competitive performances, compared to state of the art systems in both tasks. Furthermore, we have shown that V-RNN trained on the proposed feature set appear to be an effective classifier for the ASR error detection with an Accuracy of 85.43%

Crossref

White Rose Research Online

Anti-spoofing Methods for Automatic SpeakerVerification System

Author: Lavrentyeva Galina
Novoselov Sergey
Simonchik Konstantin
Publication venue
Publication date: 24/05/2017
Field of study

Growing interest in automatic speaker verification (ASV)systems has lead to significant quality improvement of spoofing attackson them. Many research works confirm that despite the low equal er-ror rate (EER) ASV systems are still vulnerable to spoofing attacks. Inthis work we overview different acoustic feature spaces and classifiersto determine reliable and robust countermeasures against spoofing at-tacks. We compared several spoofing detection systems, presented so far,on the development and evaluation datasets of the Automatic SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge 2015.Experimental results presented in this paper demonstrate that the useof magnitude and phase information combination provides a substantialinput into the efficiency of the spoofing detection systems. Also wavelet-based features show impressive results in terms of equal error rate. Inour overview we compare spoofing performance for systems based on dif-ferent classifiers. Comparison results demonstrate that the linear SVMclassifier outperforms the conventional GMM approach. However, manyresearchers inspired by the great success of deep neural networks (DNN)approaches in the automatic speech recognition, applied DNN in thespoofing detection task and obtained quite low EER for known and un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer and Information Science (CCIS) vol. 66

arXiv.org e-Print Archive

Crossref

WiSeBE: Window-based Sentence Boundary Evaluation

Author: D Yu
DD Palmer
G Hinton
H Brum
JL Fleiss
K Pearson
M Rott
N Jamil
T Kiss
Y Liu
Publication venue
Publication date: 01/01/2018
Field of study

Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task.Comment: In proceedings of the 17th Mexican International Conference on Artificial Intelligence (MICAI), 201

arXiv.org e-Print Archive

Crossref

PolyPublie

ASR error management for improving spoken language understanding

Author: Camelin Nathalie
De Mori Renato
Estève Yannick
Ghannay Sahar
Simonnet Edwin
Publication venue
Publication date: 26/05/2017
Field of study

This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions , semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels with error specific labels and by using a recently proposed neural approach based on word embeddings to compute well calibrated ASR confidence measures. Experimental results are reported showing that it is possible to decrease significantly the Concept/Value Error Rate with a state of the art system, outperforming previously published results performance on the same experimental data. It also shown that combining an SLU approach based on conditional random fields with a neural encoder/decoder attention based architecture , it is possible to effectively identifying confidence islands and uncertain semantic output segments useful for deciding appropriate error handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201

arXiv.org e-Print Archive

Crossref

“CAN YOU GIVE ME ANOTHER WORD FOR HYPERBARIC?”: IMPROVING SPEECH TRANSLATION USING TARGETED CLARIFICATION QUESTIONS

Author: Ayan Necip Fazil
Blasco Peter
Béchet Frédéric
Favre Benoit
Frandsen Michael
Hirschberg Julia
Kathol Andreas
Kwiatkowski Tom
Mandal Arindam
Marin Alex
Ostendorf Mari
Salletmayr Philipp
Stoyanchev Svetlana
Zettlemoyer Luke
Zheng Jing
Publication venue
Publication date: 01/01/2013
Field of study

We present a novel approach for improving communication success between users of speech-to-speech translation systems by automatically detecting errors in the output of automatic speech recognition (ASR) and statistical machine translation (SMT) systems. Our approach initiates system-driven targeted clarification about errorful regions in user input and repairs them given user responses. Our system has been evaluated by unbiased subjects in live mode, and results show improved success of communication between users of the system. Index Terms — Speech translation, error detection, error correction, spoken dialog systems. 1

CiteSeerX

HAL AMU