Search CORE

2,766 research outputs found

Recommended from our members

Unsupervised intralingual and cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

Author: Byrne William
Gibson Matthew
Publication venue: IEEE Transactions on Audio, Speech, and Language Processing
Publication date: 01/01/2010
Field of study

Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper firstly presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Secondly, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Thirdly, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation

Apollo (Cambridge)

Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models

Author: Gibson Matthew
Publication venue
Publication date: 01/01/2009
Field of study

CiteSeerX

Apollo (Cambridge)

Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

Author: Byrne WJ
Gibson M
Hirsimaki T
Karhila R
Kurimo M
Publication venue: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
Publication date: 01/01/2010
Field of study

This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation

CiteSeerX

Apollo (Cambridge)

Speech Emotion Recognition Using Multi-hop Attention Mechanism

Author: Byun Seokhyun
Dey Subhadeep
Jung Kyomin
Yoon Seunghyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/05/2019
Field of study

In this paper, we are interested in exploiting textual and acoustic data of an utterance for the speech emotion classification task. The baseline approach models the information from audio and text independently using two deep neural networks (DNNs). The outputs from both the DNNs are then fused for classification. As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data. The proposed framework uses two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance. Furthermore, we propose an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities. The multi-hop attention first computes the relevant segments of the textual data corresponding to the audio signal. The relevant textual data is then applied to attend parts of the audio signal. To evaluate the performance of the proposed system, experiments are performed in the IEMOCAP dataset. Experimental results show that the proposed technique outperforms the state-of-the-art system by 6.5% relative improvement in terms of weighted accuracy.Comment: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral presentation

arXiv.org e-Print Archive

Crossref

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Author: Bourlard Hervé
Garner Philip N.
Tong Sibo
Publication venue
Publication date: 23/01/2018
Field of study

Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also studied and tested in order to mitigate the overfitting problem. Experiments show that the performance of the universal phoneme-based CTC system can be improved by applying LHUC and it is extensible to new phonemes during cross-lingual adaptation. Updating all the parameters shows consistent improvement on limited data. Applying dropout during adaptation can further improve the system and achieve competitive performance with Deep Neural Network / Hidden Markov Model (DNN/HMM) systems on limited data

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

Author: Bu Hui
Du Jiayu
Na Xingyu
Wu Bengu
Zheng Hao
Publication venue
Publication date: 16/09/2017
Field of study

An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin. The recording procedure, including audio capturing devices and environments are presented in details. The preparation of the related resources, including transcriptions and lexicon are described. The corpus is released with a Kaldi recipe. Experimental results implies that the quality of audio recordings and transcriptions are promising.Comment: Oriental COCOSDA 201

arXiv.org e-Print Archive

Crossref