Search CORE

54 research outputs found

Esquema unificado de parametrización de la señal de voz en reconocimiento del habla

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Vallverdú Bayés Sisco
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/1995
Field of study

A correct choice of voice signal modeling methods is essential to obtain good results in automatic speech recognition. In this paper, we have proposed a unified view of the speech parametrization stage, in which conventional techniques as Linear Prediction Coeficientes and mel-cepstrum filter bank are viewed as particular cases. The model incorporates a new technique of deconvolution, that is called root homomorphic deconvolution. A broad set of experimental results are also presentedPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system

Author: Alku Paavo
Bollepalli Bajibabu
Juvela Lauri
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 01/08/2017
Field of study

Neural network-based models that generate glottal excitation waveforms from acoustic features have been found to give improved quality in statistical parametric speech synthesis. Until now, however, these models have been trained separately from the acoustic model. This creates mismatch between training and synthesis, as the synthesized acoustic features used for the excitation model input differ from the original inputs, with which the model was trained on. Furthermore, due to the errors in predicting the vocal tract filter, the original excitation waveforms do not provide perfect reconstruction of the speech waveform even if predicted without error. To address these issues and to make the excitation model more robust against errors in acoustic modeling, this paper proposes two modifications to the excitation model training scheme. First, the excitation model is trained in a connected manner, with inputs generated by the acoustic model. Second, the target glottal waveforms are re-estimated by performing glottal inverse filtering with the predicted vocal tract filters. The results show that both of these modifications improve performance measured in MSE and MFCC distortion, and slightly improve the subjective quality of the synthetic speech.Peer reviewe

Crossref

Edinburgh Research Explorer

Aaltodoc Publication Archive

Speech vocoding for laboratory phonology

Author: Benus Stefan
Cernak Milos
Lazaridis Alexandros
Publication venue: 'Elsevier BV'
Publication date: 19/05/2015
Field of study

Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology. We show three application examples for laboratory phonology: compositional phonological speech modelling, a comparison of phonological systems and an experimental phonological parametric text-to-speech (TTS) system. The featural representations of the following three phonological systems are considered in this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English (SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded speech, we conclude that the latter achieves slightly better results than the former. However, GP - the most compact phonological speech representation - performs comparably to the systems with a higher number of phonological features. The parametric TTS based on phonological speech representation, and trained from an unlabelled audiobook in an unsupervised manner, achieves intelligibility of 85% of the state-of-the-art parametric speech synthesis. We envision that the presented approach paves the way for researchers in both fields to form meaningful hypotheses that are explicitly testable using the concepts developed and exemplified in this paper. On the one hand, laboratory phonologists might test the applied concepts of their theoretical models, and on the other hand, the speech processing community may utilize the concepts developed for the theoretical phonological models for improvements of the current state-of-the-art applications

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Comparison of different order cumulants in a speech enhancement system by adaptive Wiener filtering

Author: Masgrau Gómez Enrique José
Moreno Bilbao M. Asunción
Salavedra Molí Josep
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1993
Field of study

The authors study some speech enhancement algorithms based on the iterative Wiener filtering method due to Lim and Oppenheim (1978), where the AR spectral estimation of the speech is carried out using a second-order analysis. But in their algorithms the authors consider an AR estimation by means of a cumulant (third- and fourth-order) analysis. The authors provide a behavior comparison between the cumulant algorithms and the classical autocorrelation one. Some results are presented considering the noise (additive white Gaussian noises) that allows the best improvement and those noises (diesel engine and reactor noise) that leads to the worst one. And exhaustive empirical test shows that cumulant algorithms outperform the original autocorrelation algorithm, specially at low SNR.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Learning HMM State Sequences from Phonemes for Speech Synthesis

Author: Claudio Turchetti
Giorgio Biagetti
Laura Falaschetti
Paolo Crippa
Simone Orcioni
Publication venue
Publication date: 01/01/2016
Field of study

AbstractThis paper presents a technique for learning hidden Markov model (HMM) state sequences from phonemes, that combined with modified discrete cosine transform (MDCT), is useful for speech synthesis. Mel-cepstral spectral parameters, currently adopted in the conventional methods as features for HMM acoustic modeling, do not ensure direct speech waveforms reconstruction. In contrast to these approaches, we use an analysis/synthesis technique based on MDCT that guarantees a perfect reconstruction of the signal frame feature vectors and allows for a 50% overlap between frames without increasing the data rate. Experimental results show that the spectrograms achieved with the suggested technique behave very closely to the original spectrograms, and the quality of synthesized speech is conveniently evaluated using the well known Itakura-Saito measure

Elsevier - Publisher Connector

Crossref

IRIS UniversitÃ Politecnica delle Marche

Open Access Repository

New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis

Author: Alku Paavo
Järvikivi Juhani
Nurminen Jani
Raitio Tuomo
Suni Antti Santeri
Vainio Martti
Publication venue
Publication date: 01/01/2009
Field of study

This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

MPG.PuRe

ICANDO: Intellectual Computer AssistaNt for Disabled Operators

Author: Karpov Alexey
Ronzhin Andrey
Publication venue
Publication date
Field of study

Publication in the conference proceedings of EUSIPCO, Florence, Italy, 200

CiteSeerX

ZENODO