3,094 research outputs found
English Broadcast News Speech Recognition by Humans and Machines
With recent advances in deep learning, considerable attention has been given
to achieving automatic speech recognition performance close to human
performance on tasks like conversational telephone speech (CTS) recognition. In
this paper we evaluate the usefulness of these proposed techniques on broadcast
news (BN), a similar challenging task. We also perform a set of recognition
measurements to understand how close the achieved automatic speech recognition
results are to human performance on this task. On two publicly available BN
test sets, DEV04F and RT04, our speech recognition system using LSTM and
residual network based acoustic models with a combination of n-gram and neural
network language models performs at 6.5% and 5.9% word error rate. By achieving
new performance milestones on these test sets, our experiments show that
techniques developed on other related tasks, like CTS, can be transferred to
achieve similar performance. In contrast, the best measured human recognition
performance on these test sets is much lower, at 3.6% and 2.8% respectively,
indicating that there is still room for new techniques and improvements in this
space, to reach human performance levels.Comment: \copyright 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies
This paper discusses the challenges that arise when large speech corpora
receive an ever-broadening range of diverse and distinct annotations. Two case
studies of this process are presented: the Switchboard Corpus of telephone
conversations and the TDT2 corpus of broadcast news. Switchboard has undergone
two independent transcriptions and various types of additional annotation, all
carried out as separate projects that were dispersed both geographically and
chronologically. The TDT2 corpus has also received a variety of annotations,
but all directly created or managed by a core group. In both cases, issues
arise involving the propagation of repairs, consistency of references, and the
ability to integrate annotations having different formats and levels of detail.
We describe a general framework whereby these issues can be addressed
successfully.Comment: 7 pages, 2 figure
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Spoken Language Learning System : an online conversational spoken language learning system
Thesis: M. Eng., Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003Includes bibliographical references (leaves 75-77).The Spoken Language Learning System (SLLS) is intended to be an engaging, educational, and extensible spoken language learning system showcasing the multilingual capabilities of the Spoken Language Systems Group's (SLS) systems. The motivation behind SLLS is to satisfy both the demand for spoken language learning in an increasingly multi-cultural society and the desire for continued development of the multilingual systems at SLS. SLLS is an integration of an Internet presence with augmentations to SLS's Mandarin systems built within the Galaxy architecture, focusing on the situation of an English speaker learning Mandarin. We offer language learners the ability to listen to spoken phrases and simulated conversations online, engage in interactive dynamic conversations over the telephone, and review audio and visual feedback of their conversations. We also provide a wide array of administration and maintenance features online for teachers and administrators to facilitate continued system development and user interaction, such as lesson plan creation, vocabulary management, and a requests forum. User studies have shown that there is an appreciation for the potential of the system and that the core operation is intuitive and entertaining. The studies have also helped to illuminate the vast array of future work necessary to further polish the language learning experience and reduce the administrative burden. The focus of this thesis is the creation of the first iteration of SLLS; we believe we have taken the first step down the long but hopeful path towards helping people speak a foreign language.by Tien-Lok Jonathan Lau.M. Eng.M.Eng. Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Scienc
- …