Search CORE

2,285 research outputs found

Context-Dependent Acoustic Modeling without Explicit Phone Clustering

Author: Beck Eugen
Ney Hermann
Raissi Tina
Schlüter Ralf
Publication venue
Publication date: 15/05/2020
Field of study

Phoneme-based acoustic modeling of large vocabulary automatic speech recognition takes advantage of phoneme context. The large number of context-dependent (CD) phonemes and their highly varying statistics require tying or smoothing to enable robust training. Usually, Classification and Regression Trees are used for phonetic clustering, which is standard in Hidden Markov Model (HMM)-based systems. However, this solution introduces a secondary training objective and does not allow for end-to-end training. In this work, we address a direct phonetic context modeling for the hybrid Deep Neural Network (DNN)/HMM, that does not build on any phone clustering algorithm for the determination of the HMM state inventory. By performing different decompositions of the joint probability of the center phoneme state and its left and right contexts, we obtain a factorized network consisting of different components, trained jointly. Moreover, the representation of the phonetic context for the network relies on phoneme embeddings. The recognition accuracy of our proposed models on the Switchboard task is comparable and outperforms slightly the hybrid model using the standard state-tying decision trees.Comment: Submitted to Interspeech 202

arXiv.org e-Print Archive

Crossref

Croatian Speech Recognition

Author: Ivo Ipsic
Sanda Martincic-Ipsic
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Multidialectal Spanish acoustic modeling for speech recognition

Author: Albino Nogueiras
Asunción Moreno
Ferreiros
Gibbon
Heeringa
Imperl
Kirchhoff
Köhler
Lipski
Mónica Caballero
Schultz
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Decision tree-based acoustic models for speech recognition

Author: J Ajmera
J Ajmera
J Ajmera
J Droppo
Jitendra Ajmera
JT Foote
L Breiman
Masami Akamine
OR Duda
PC Woodland
R Teunen
S Young
V Tyagi
X Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

HMM-based synthesis of child speech

Author: Berkling Kay
King Simon
Watts Oliver
Yamagishi Junichi
Publication venue
Publication date: 01/01/2008
Field of study

The synthesis of child speech presents challenges both in the collection of data and in the building of a synthesiser from that data. Because only limited data can be collected, and the domain of that data is constrained, it is difficult to obtain the type of phonetically-balanced corpus usually used in speech synthesis. As a consequence, building a synthesiser from this data is difficult. Concatenative synthesisers are not robust to corpora with many missing units (as is likely when the corpus content is not carefully designed), so we chose to build a statistical parametric synthesiser using the HMM-based system HTS. This technique has previously been shown to perform well for limited amounts of data, and for data collected under imperfect conditions. We compared 6 different configurations of the synthesiser, using both speaker-dependent and speaker-adaptive modelling techniques, and using varying amounts of data. The output from these systems was evaluated alongside natural and vocoded speech, in a Blizzard-style listening test

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Advances in the application of support vector machines as probabilistic estimators for continuous automatic speech recognition

Author: Bolaños Alonso Daniel
Publication venue
Publication date: 01/01/2008
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, noviembre de 200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Speech Technologies for Serbian and Kindred South Slavic Languages

Author: Darko Pekar
Marko Janev
Milan Secujski
Niksa Jakovljevic
Radovan Obradovic
Vlado Delic
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Towards Personalized Synthesized Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction

Author: King Simon
Veaux Christophe
Yamagishi Junichi
Publication venue
Publication date: 01/01/2013
Field of study

When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neurone disease (MND) or Parkinson’s, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as voice banking. But, for some patients, the speech deterioration frequently coincides or quickly follows diagnosis. Using HMM-based speech synthesis, it is now possible to build personalized synthetic voices with minimal data recordings and even disordered speech. The power of this approach is that it is possible to use the patient’s recordings to adapt existing voice models pre-trained on many speakers. When the speech has begun to deteriorate, the adapted voice model can be further modified in order to compensate for the disordered characteristics found in the patient’s speech. The University of Edinburgh has initiated a project for voice banking and reconstruction based on this speech synthesis technology. At the current stage of the project, more than fifteen patients with MND have already been recorded and five of them have been delivered a reconstructed voice. In this paper, we present an overview of the project as well as subjective assessments of the reconstructed voices and feedback from patients and their families

CiteSeerX

Edinburgh Research Explorer