Search CORE

590 research outputs found

Data-driven Speech Enhancement:from Non-negative Matrix Factorization to Deep Representation Learning

Author: Xiang Yang
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2022
Field of study

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Author: Bourlard Hervé
Magimai.-Doss Mathew
Rasipuram Ramya
Ullmann Raphael
Publication venue: Idiap
Publication date: 19/04/2015
Field of study

Objective assessment of synthetic speech intelligibility can be a useful tool for the development of text-to-speech (TTS) systems, as it provides a reproducible and inexpensive alternative to subjective listening tests. In a recent work, it was shown that the intelligibility of synthetic speech could be assessed objectively by comparing two sequences of phoneme class conditional probabilities, corresponding to instances of synthetic and human reference speech, respectively. In this paper, we build on those findings to propose a novel approach that formulates objective intelligibility assessment as an utterance verification problem using hidden Markov models, thereby alleviating the need for human reference speech. Specifically, given each text input to the TTS system, the proposed approach automatically verifies the words in the output synthetic speech signal and estimates an intelligibility score based on word recall statistics. We evaluate the proposed approach on the 2011 Blizzard Challenge data, and show that the estimated scores and the subjective intelligibility scores are highly correlated (Pearson’s |R| = 0.94)

Infoscience - École polytechnique fédérale de Lausanne

Making Faces - State-Space Models Applied to Multi-Modal Signal Processing

Author: Lehn-Schiøler Tue
Publication venue: Technical University of Denmark
Publication date: 01/01/2005
Field of study

Online Research Database In Technology

A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Author: André Goalic
Bhattacharyya
Boll
Chen
Cohen
Cong-Thanh Do
Cooke
Dempster
Do
Dominique Pastor
Ephraim
Ephraim
Furui
Gales
Gong
Gustafsson
Hansen
Hermansky
Hu
Jabloun
Kullback
Leggetter
Loizou
Loizou
Mansour
Nadas
Rangachari
Shannon
Shannon
Silva
Silva
Young
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Deep neural network context embeddings for model selection in rich-context HMM synthesis

Author: King Simon
Merritt Thomas
Watts Oliver
Wu Zhizheng
Yamagishi Junichi
Publication venue
Publication date: 06/09/2015
Field of study

This paper introduces a novel form of parametric synthesis that uses context embeddings produced by the bottleneck layer of a deep neural network to guide the selection of models in a rich-context HMM-based synthesiser. Rich-context synthesis – in which Gaussian distributions estimated from single lin-guistic contexts seen in the training data are used for synthesis, rather than more conventional decision tree-tied models – was originally proposed to address over-smoothing due to averag-ing across contexts. Our previous investigations have confirmed experimentally that averaging across different contexts is in-deed one of the largest factors contributing to the limited quality of statistical parametric speech synthesis. However, a possible weakness of the rich context approach as previously formulated is that a conventional tied model is still used to guide selection of Gaussians at synthesis time. Our proposed approach replaces this with context embeddings derived from a neural network. Index Terms: speech synthesis, hidden Markov model, deep neural networks, rich context, embeddin

CiteSeerX

Edinburgh Research Explorer

A Posterior-Based Multistream Formulation for G2P Conversion

Author: Magimai.-Doss Mathew
Razavi Marzieh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/03/2017
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

Author: King Simon
Oura Keiichiro
Tokuda Keiichi
Wester Mirjam
Yamagishi Junichi
Publication venue: 'Elsevier BV'
Publication date: 01/07/2012
Field of study

Crossref

Edinburgh Research Explorer

Audio source separation for music in low-latency and high-latency scenarios

Author: Marxer Piñón Ricard
Publication venue: 'Universitat Pompeu Fabra'
Publication date: 01/01/2013
Field of study

Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Speech Synthesis Based on Hidden Markov Models

Author: Nankaku Y.
Oura K.
Toda T.
Tokuda K.
Yamagishi J.
Zen H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

Edinburgh Research Explorer