Search CORE

2,073 research outputs found

Analysis of Unsupervised and Noise-Robust Speaker-Adaptive HMM-Based Speech Synthesis Systems toward a Unified ASR and TTS Framework

Author: Dines John
Gibson Matthew
Guan Yong
King Simon
Lincoln Mike
Tian Jilei
Yamagishi Junichi
Publication venue
Publication date: 01/01/2009
Field of study

For the 2009 Blizzard Challenge we have built an unsupervised version of the HTS-2008 speaker-adaptive HMM-based speech synthesis system for English, and a noise robust version of the systems for Mandarin. They are designed from a multidisciplinary application point of view in that we attempt to integrate the components of the TTS system with other technologies such as ASR. All the average voice models are trained exclusively from recognized, publicly available, ASR databases. Multi-pass LVCSR and confidence scores calculated from confusion network are used for the unsupervised systems, and noisy data recorded in cars or public spaces is used for the noise robust system. We believe the developed systems form solid benchmarks and provide good connections to ASR fields. This paper describes the development of the systems and reports the results and analysis of their evaluation

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment

Author: Chen Zhengmao
Li Dan
Lin Yi
Tan Xianlong
Wang Bing
Wu Xiping
Yang Bo
Yang Zhongping
Publication venue: 'International Speech Communication Association'
Publication date: 26/11/2019
Field of study

Automatic Speech Recognition (ASR) is greatly developed in recent years, which expedites many applications on other fields. For the ASR research, speech corpus is always an essential foundation, especially for the vertical industry, such as Air Traffic Control (ATC). There are some speech corpora for common applications, public or paid. However, for the ATC, it is difficult to collect raw speeches from real systems due to safety issues. More importantly, for a supervised learning task like ASR, annotating the transcription is a more laborious work, which hugely restricts the prospect of ASR application. In this paper, a multilingual speech corpus (ATCSpeech) from real ATC systems, including accented Mandarin Chinese and English, is built and released to encourage the non-commercial ASR research in ATC domain. The corpus is detailly introduced from the perspective of data amount, speaker gender and role, speech quality and other attributions. In addition, the performance of our baseline ASR models is also reported. A community edition for our speech database can be applied and used under a special contrast. To our best knowledge, this is the first work that aims at building a real and multilingual ASR corpus for the air traffic related research

arXiv.org e-Print Archive

Crossref

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Saudi Accented Arabic Voice Bank

Author: Alenazi Ammar
Alghamdi Mansour
Alhargan Fayez
Alkanhal Mohammed
Alkhairy Ashraf
Eldesouki Munir
Publication venue: King Saud University. Production and hosting by Elsevier B.V.
Publication date: 31/12/2008
Field of study

AbstractThe aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that meet these challenges are highlighted. SAAVB consists of 1033 speakers speak in Modern Standard Arabic with a Saudi accent. The SAAVB content is analyzed and the results are illustrated. The content was verified internally and externally by IBM Cairo and can be used to train speech engines such as automatic speech recognition and speaker verification systems

Elsevier - Publisher Connector

Robust Speech Recognition for Adverse Environments

Author: Liu Chao-Hong
Wu Chung-Hsien
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

IntechOpen