2,073 research outputs found
Analysis of Unsupervised and Noise-Robust Speaker-Adaptive HMM-Based Speech Synthesis Systems toward a Unified ASR and TTS Framework
For the 2009 Blizzard Challenge we have built an unsupervised version of the HTS-2008 speaker-adaptive HMM-based speech synthesis system for English, and a noise robust version of the systems for Mandarin. They are designed from a multidisciplinary application point of view in that we attempt to integrate the components of the TTS system with other technologies such as ASR. All the average voice models are trained exclusively from recognized, publicly available, ASR databases. Multi-pass LVCSR and confidence scores calculated from confusion network are used for the unsupervised systems, and noisy data recorded in cars or public spaces is used for the noise robust system. We believe the developed systems form solid benchmarks and provide good connections to ASR fields. This paper describes the development of the systems and reports the results and analysis of their evaluation
ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment
Automatic Speech Recognition (ASR) is greatly developed in recent years,
which expedites many applications on other fields. For the ASR research, speech
corpus is always an essential foundation, especially for the vertical industry,
such as Air Traffic Control (ATC). There are some speech corpora for common
applications, public or paid. However, for the ATC, it is difficult to collect
raw speeches from real systems due to safety issues. More importantly, for a
supervised learning task like ASR, annotating the transcription is a more
laborious work, which hugely restricts the prospect of ASR application. In this
paper, a multilingual speech corpus (ATCSpeech) from real ATC systems,
including accented Mandarin Chinese and English, is built and released to
encourage the non-commercial ASR research in ATC domain. The corpus is detailly
introduced from the perspective of data amount, speaker gender and role, speech
quality and other attributions. In addition, the performance of our baseline
ASR models is also reported. A community edition for our speech database can be
applied and used under a special contrast. To our best knowledge, this is the
first work that aims at building a real and multilingual ASR corpus for the air
traffic related research
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Saudi Accented Arabic Voice Bank
AbstractThe aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that meet these challenges are highlighted. SAAVB consists of 1033 speakers speak in Modern Standard Arabic with a Saudi accent. The SAAVB content is analyzed and the results are illustrated. The content was verified internally and externally by IBM Cairo and can be used to train speech engines such as automatic speech recognition and speaker verification systems
- …