Search CORE

1 research outputs found

Open Source Multi-Language Audio Database for Spoken Language Processing Applications

Author: Andrew Hwang
Brian Wong
Chandra Sekharvootkuri
Eldar Tokhtamyshev
Jiang Wu
Montri Karnjanadecha
Stephen A Zahorian
Publication venue
Publication date: 06/03/2020
Field of study

Abstract Over the past few decades, research in automatic speech recognition and automatic speaker recognition has been greatly facilitated by the sharing of large annotated speech databases such as those distributed by the Linguistic Data Consortium (LDC). Open sources, particularly web sites such as YouTube, contain vast and varied speech recordings in a variety of languages. However, these "open sources" for speech data are largely untapped as resources for speech research. In this paper, a project to collect, organize, and annotate a large group of this speech data is described. The data consists of approximately 30 hours of speech in each of three languages, English, Mandarin Chinese, and Russian. Each of 900 recordings has been orthographically transcribed at the sentence/phrase level by human listeners. Some of the issues related to working with this low quality, varied, noisy speech data in three languages are described

CiteSeerX