Search CORE

9 research outputs found

BUCEADOR hybrid TTS for blizzard challenge 2011

Author: Adell Mercado Jordi
Bonafonte Cávez Antonio
Erro Eslava Daniel
Navas Eva
Sainz Iñaki
Publication venue
Publication date: 01/01/2011
Field of study

This paper describes the Text-to-Speech (TTS) systems presented by the Buceador Consortium in the Blizzard Challenge 2011 evaluation campaign. The main system is a concatenative hybrid one that tries to combine the strong points of both statistical and unit selection synthesis (i.e. robustness and segmental naturalness respectively). The hybrid system has reached results significantly above average as far as similarity and naturalness are concerned, with no significant differences with most of the systems in the intelligibility task. This clearly improves the performance achieved in previous participations, and shows the validity of the hybrid approach proposed. Besides, an HMM-based system was built for the ES1 intelligibility tasks, using an HNM-based vocoder.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A uniform phase representation for the harmonic model in speech synthesis applications

Author: Degottex Gilles
Erro Eslava Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features for the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients (MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design. To randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which distinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between voiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision. The synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the suggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some limitations of the harmonic framework itself in the case of high fundamental frequencies.G. Degottex has been funded by the Swiss National Science Foundation (SNSF) (grants PBSKP2_134325, PBSKP2_140021), Switzerland, and the Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece. D. Erro has been funded by the Basque Government (BER2TEK, IE12-333) and the Spanish Ministry of Economy and Competitiveness (SpeechTech4All, TEC2012-38939-C03-03)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Applying a new classifier fusion technique to audio segmentation

Author: Erro Eslava Daniel
Hernáez Rioja Inmaculada
Navas Cordón Eva
Saratxaga Couceiro Ibon
Tavárez Arriba David
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2013
Field of study

Este artículo presenta un nuevo algoritmo de fusión de clasificadores a partir de su matriz de confusión de la que se extraen los valores de precisión (precision) y cobertura (recall) de cada uno de ellos. Los únicos datos requeridos para poder aplicar este nuevo método de fusión son las clases o etiquetas asignadas por cada uno de los sistemas y las clases de referencia en la parte de desarrollo de la base de datos. Se describe el algoritmo propuesto y se recogen los resultados obtenidos en la combinación de las salidas de dos sistemas participantes en la campaña de evaluación de segmentación de audio Albayzin 2012. Se ha comprobado la robustez del algoritmo, obteniendo una reducción relativa del error de segmentación del 6.28% utilizando para realizar la fusión el sistema con menor y mayor tasa de error de los presentados a la evaluación.This paper presents a new classifier fusion algorithm based on the confusion matrixes of the classifiers which are used to extract the corresponding precision and recall values. The only data needed to be able to apply this new fusion method are the classes or labels assigned by each of the classifiers as well as the reference classes in the development part of the database. The proposed algorithm is described and it is applied to the fusion of two audio segmentation systems that took part in Albayzin 2012 evaluation campaign. The robustness of the algorithm has been assessed and a relative improvement of 6.28% has been achieved when combining the results of the best and worst systems presented to the evaluation.Este trabajo ha sido financiado parcialmente por la UPV/EHU (Ayudas para la Formación de Personal Investigador), el Gobierno Vasco (proyecto Ber2Tek, IE12-333) y el Ministerio de Economía y Competitividad (Proyecto SpeechTech4All, http://speechtech4all.uvigo.es/, TEC2012-38939-C03-03)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Post-processing techniques for a speaker diarization system

Author: Erro Eslava Daniel
Hernáez Rioja Inmaculada
Navas Cordón Eva
Saratxaga Couceiro Ibon
Tavárez Arriba David
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2012
Field of study

Este artículo presenta las técnicas de postprocesado diseñadas para mejorar los resultados de un sistema de diarización de locutores. Se han propuesto tres técnicas de mejora: el refinado de la segmentación voz/no voz, la asimilación de los segmentos cortos y la fusión de los clusters del mismo locutor. Las técnicas se han implementado en un módulo que se aplica como etapa de postprocesado y que ha mejorado un 22.3% el resultado del sistema base. El módulo se ha aplicado sin realizar ningún ajuste sobre otro sistema de diarización de arquitectura similar al sistema base con una mejora del 21% y sobre uno con arquitectura muy diferente sin conseguirse mejoras. Asimismo se ha utilizado con otra base de datos y se ha conseguido mejorar el DER un 17 %. Esto demuestra la validez de las técnicas desarrolladas para la mejora de los resultados de la diarización.This paper presents the post-processing techniques designed to improve the results of a speaker diarization system. Three different techniques are proposed: refinement of speech vs. non speech segmentation, assimilation of short speech segments and fusion of clusters from the same speaker. These techniques have been implemented in a post-processing module that improves the result of the baseline system by 22.3 %. The same module has been applied to another speaker diarization system with a similar architecture to that of the baseline system with a DER improvement of 21% and to another one with a very different architecture where no improvement has been achieved. It has also been used with another database with an improvement of 17 %. These experiments prove the validity of the techniques developed.Este trabajo ha sido financiado parcialmente por la UPV/EHU (Ayudas para la Formación de Personal Investigador), el Gobierno Vasco (proyecto BerbaTek, IE09-262) y el Ministerio de Ciencia e Innovación (Proyecto Buceador, TEC2009-14094-C04-02)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Open-source text to speech synthesis system for Iberian languages

Author: Alonso Burguera Agustín
Erro Eslava Daniel
Hernáez Rioja Inmaculada
Navas Cordón Eva
Sainz Moncalvillo Iñaki
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2013
Field of study

Este artículo presenta un conversor texto a voz basado en síntesis estadística que por primera vez permite disponer en un único sistema de las cuatro lenguas oficiales en España además del inglés. Tomando como punto de partida el sistema AhoTTS existente para el castellano y el euskera, se le han añadido funcionalidades para incluir el catalán, el gallego y el inglés utilizando módulos disponibles en código abierto. El sistema resultante, denominado AhoTTS multilingüe, ha sido liberado en código abierto y ya está siendo utilizado en aplicaciones reales.This paper presents a text-to-speech system based on statistical synthesis which, for the first time, allows generating speech in any of the four official languages of Spain as well as English. Using the AhoTTS system already developed for Spanish and Basque as a starting point, we have added support for Catalan, Galician and English using the code of available open-source modules. The resulting system, named multilingual AhoTTS, has also been released as open-source and it is already being used in real applications

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BUCEADOR hybrid TTS for blizzard challenge 2011

Author: Adell Mercado Jordi
Bonafonte Cávez Antonio
Erro Eslava Daniel
Navas Eva
Sainz Iñaki
Publication venue
Publication date: 01/01/2011
Field of study

Automatic speaker recognition as a measurement of voice imitation and conversion

Author: Erro Eslava Daniel
Farrús Cabeceran Mireia
Hernando Pericás Francisco Javier
Wagner Michael
Publication venue
Publication date
Field of study

Voices can be deliberately disguised by means of human imitation or voice conversion. The question arises to what extent they can be modified by using either method. In the current paper, a set of speaker identification experiments are conducted; first, analysing some prosodic features extracted from voices of professional impersonators attempting to mimic a target voice and, second, using both intragender and crossgender converted voices in a spectral-based speaker recognition system. The results obtained in the current experiments show that the identification error rate increases when testing with imitated voices, as well as when using converted voices, especially the crossgender conversions

RECERCAT

BUCEADOR hybrid TTS for blizzard challenge 2011

Author: Adell Mercado Jordi
Bonafonte Cávez Antonio
Erro Eslava Daniel
Navas Eva
Sainz Iñaki
Publication venue
Publication date
Field of study

RECERCAT

Nafar-lapurtar euskalkiarentzako euskal TTS bat garatzea

Author: CORDÓN Eva Navas
ESLAVA Daniel Erro
OYHARÇABAL Bernard
PADILLA MOYANO Manuel
RIOJA Inmaculada Hernáez
SALABERRIA FULDAIN Jasone
Publication venue: Mendebalde Kultur Alkartea
Publication date: 01/01/2015
Field of study

Oskar Bordeaux