Search CORE

39 research outputs found

English Broadcast News Speech Recognition by Humans and Machines

Author: Dibert Tom
Huang Yinghui
Kaiser-Schatzlein Alice
Kingsbury Brian
Kurata Gakuto
Picheny Michael
Samko Bern
Saon George
Suzuki Masayuki
Thomas Samuel
Tuske Zoltan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2019
Field of study

With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to understand how close the achieved automatic speech recognition results are to human performance on this task. On two publicly available BN test sets, DEV04F and RT04, our speech recognition system using LSTM and residual network based acoustic models with a combination of n-gram and neural network language models performs at 6.5% and 5.9% word error rate. By achieving new performance milestones on these test sets, our experiments show that techniques developed on other related tasks, like CTS, can be transferred to achieve similar performance. In contrast, the best measured human recognition performance on these test sets is much lower, at 3.6% and 2.8% respectively, indicating that there is still room for new techniques and improvements in this space, to reach human performance levels.Comment: \copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

arXiv.org e-Print Archive

Crossref

Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

Author: Raaijmakers Stephan
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 27/07/2007
Field of study

University of Twente Research Information

cmu gale speech-to-text system,”

Author: Florian Metze
Qin Jin
Roger Hsiao
Tanja Schultz
Udhyakumar Nallasamy
Publication venue
Publication date: 01/01/2010
Field of study

Abstract This paper describes the latest Speech-to-Text system developed for the Global Autonomous Language Exploitation ("GALE") domain by Carnegie Mellon University (CMU). This systems uses discriminative training, bottle-neck features and other techniques that were not used in previous versions of our system, and is trained on 1150 hours of data from a variety of Arabic speech sources. In this paper, we show how different lexica, pre-processing, and system combination techniques can be used to improve the final output, and provide analysis of the improvements achieved by the individual techniques

CiteSeerX

The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

Author: Baroni Paola
Bel N?ria
Budin Gerhard
Calzolari Nicoletta
Choukri Khalid
Goggi Sara
Mariani Joseph
Monachini Monica
Odijk Jan
Piperidis Stelios
Quochi Valeria
Soria Claudia
Toral Antonio
Publication venue: Istituto di Linguistica Computazionale del CNR - Pisa, ITALY
Publication date
Field of study

Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

PUblication MAnagement

Heterophonic speech recognition using composite phones

Author: CJ Leggetter
DL Hinton
F Jelinek
GE Dahl
H Soltau
JP Olive
K Kirchhoff
L Lamel
M Abushariaha
T Demeechai
Y El-Imam
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Automated Regression Testing Approach To Expansion And Refinement Of Speech Recognition Grammars

Author: Dookhoo Raul
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2008
Field of study

This thesis describes an approach to automated regression testing for speech recognition grammars. A prototype Audio Regression Tester called ART has been developed using Microsoft\u27s Speech API and C#. ART allows a user to perform any of three tasks: automatically generate a new XML-based grammar file from standardized SQL database entries, record and cross-reference audio files for use by an underlying speech recognition engine, and perform regression tests with the aid of an oracle grammar. ART takes as input a wave sound file containing speech and a newly created XML grammar file. It then simultaneously executes two tests: one with the wave file and the new grammar file and the other with the wave file and the oracle grammar. The comparison result of the tests is used to determine whether the test was successful or not. This allows rapid exhaustive evaluations of additions to grammar files to guarantee forward process as the complexity of the voice domain grows. The data used in this research to derive results were taken from the LifeLike project. However, the capabilities of ART extend beyond LifeLike. The results gathered have shown that using a person\u27s recorded voice to do regression testing is as effective as having the person do live testing. A cost-benefit analysis, using two published equations, one for Cost and the other for Benefit, was also performed to determine if automated regression testing is really more effective than manual testing. Cost captures the salaries of the engineers who perform regression testing tasks and Benefit captures revenue gains or losses related to changes in product release time. ART had a higher benefit of

21461.08 when compared to manual regression testing which had a benefit of

21393.99. Coupled with its excellent error detection rates, ART has proven to be very efficient and cost-effective in speech grammar creation and refinement

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Discourse Cohesion in Chinese-English Statistical Machine Translation

Author: Steele David
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/09/2019
Field of study

In discourse, cohesion is a required component of meaningful and well organised text. It establishes the relationship between different elements in the text using a number of devices such as pronouns, determiners, and conjunctions. In translation a well translated document will display the correct cohesion and use of cohesive devices that are pertinent to the language. However, not all languages have the same cohesive devices or use them in the same way. In statistical machine translation this is a particular barrier to generating smooth translations, especially when sentences in parallel corpora are being treated in isolation and no extra meaning or cohesive context is provided beyond the sentential level. In this thesis, focussing on Chinese 1 and English as the language pair, we examine discourse cohesion in statistical machine translation looking at ways that systems can leverage discourse cues and signals in order to produce smoother translations. We also provide a statistical model that improves translation output by adding additional tokens within text that can be used to leverage extra information. A significant part of this research involved visualising many of the results and system outputs, and so an overview of two important pieces of visualisation software that we developed is also included

White Rose E-theses Online