9 research outputs found

    BUCEADOR hybrid TTS for blizzard challenge 2011

    Get PDF
    This paper describes the Text-to-Speech (TTS) systems presented by the Buceador Consortium in the Blizzard Challenge 2011 evaluation campaign. The main system is a concatenative hybrid one that tries to combine the strong points of both statistical and unit selection synthesis (i.e. robustness and segmental naturalness respectively). The hybrid system has reached results significantly above average as far as similarity and naturalness are concerned, with no significant differences with most of the systems in the intelligibility task. This clearly improves the performance achieved in previous participations, and shows the validity of the hybrid approach proposed. Besides, an HMM-based system was built for the ES1 intelligibility tasks, using an HNM-based vocoder.Peer ReviewedPostprint (published version

    A uniform phase representation for the harmonic model in speech synthesis applications

    Get PDF
    Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features for the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients (MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design. To randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which distinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between voiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision. The synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the suggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some limitations of the harmonic framework itself in the case of high fundamental frequencies.G. Degottex has been funded by the Swiss National Science Foundation (SNSF) (grants PBSKP2_134325, PBSKP2_140021), Switzerland, and the Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece. D. Erro has been funded by the Basque Government (BER2TEK, IE12-333) and the Spanish Ministry of Economy and Competitiveness (SpeechTech4All, TEC2012-38939-C03-03)

    Applying a new classifier fusion technique to audio segmentation

    Get PDF
    Este artículo presenta un nuevo algoritmo de fusión de clasificadores a partir de su matriz de confusión de la que se extraen los valores de precisión (precision) y cobertura (recall) de cada uno de ellos. Los únicos datos requeridos para poder aplicar este nuevo método de fusión son las clases o etiquetas asignadas por cada uno de los sistemas y las clases de referencia en la parte de desarrollo de la base de datos. Se describe el algoritmo propuesto y se recogen los resultados obtenidos en la combinación de las salidas de dos sistemas participantes en la campaña de evaluación de segmentación de audio Albayzin 2012. Se ha comprobado la robustez del algoritmo, obteniendo una reducción relativa del error de segmentación del 6.28% utilizando para realizar la fusión el sistema con menor y mayor tasa de error de los presentados a la evaluación.This paper presents a new classifier fusion algorithm based on the confusion matrixes of the classifiers which are used to extract the corresponding precision and recall values. The only data needed to be able to apply this new fusion method are the classes or labels assigned by each of the classifiers as well as the reference classes in the development part of the database. The proposed algorithm is described and it is applied to the fusion of two audio segmentation systems that took part in Albayzin 2012 evaluation campaign. The robustness of the algorithm has been assessed and a relative improvement of 6.28% has been achieved when combining the results of the best and worst systems presented to the evaluation.Este trabajo ha sido financiado parcialmente por la UPV/EHU (Ayudas para la Formación de Personal Investigador), el Gobierno Vasco (proyecto Ber2Tek, IE12-333) y el Ministerio de Economía y Competitividad (Proyecto SpeechTech4All, http://speechtech4all.uvigo.es/, TEC2012-38939-C03-03)

    Post-processing techniques for a speaker diarization system

    Get PDF
    Este artículo presenta las técnicas de postprocesado diseñadas para mejorar los resultados de un sistema de diarización de locutores. Se han propuesto tres técnicas de mejora: el refinado de la segmentación voz/no voz, la asimilación de los segmentos cortos y la fusión de los clusters del mismo locutor. Las técnicas se han implementado en un módulo que se aplica como etapa de postprocesado y que ha mejorado un 22.3% el resultado del sistema base. El módulo se ha aplicado sin realizar ningún ajuste sobre otro sistema de diarización de arquitectura similar al sistema base con una mejora del 21% y sobre uno con arquitectura muy diferente sin conseguirse mejoras. Asimismo se ha utilizado con otra base de datos y se ha conseguido mejorar el DER un 17 %. Esto demuestra la validez de las técnicas desarrolladas para la mejora de los resultados de la diarización.This paper presents the post-processing techniques designed to improve the results of a speaker diarization system. Three different techniques are proposed: refinement of speech vs. non speech segmentation, assimilation of short speech segments and fusion of clusters from the same speaker. These techniques have been implemented in a post-processing module that improves the result of the baseline system by 22.3 %. The same module has been applied to another speaker diarization system with a similar architecture to that of the baseline system with a DER improvement of 21% and to another one with a very different architecture where no improvement has been achieved. It has also been used with another database with an improvement of 17 %. These experiments prove the validity of the techniques developed.Este trabajo ha sido financiado parcialmente por la UPV/EHU (Ayudas para la Formación de Personal Investigador), el Gobierno Vasco (proyecto BerbaTek, IE09-262) y el Ministerio de Ciencia e Innovación (Proyecto Buceador, TEC2009-14094-C04-02)

    Open-source text to speech synthesis system for Iberian languages

    Get PDF
    Este artículo presenta un conversor texto a voz basado en síntesis estadística que por primera vez permite disponer en un único sistema de las cuatro lenguas oficiales en España además del inglés. Tomando como punto de partida el sistema AhoTTS existente para el castellano y el euskera, se le han añadido funcionalidades para incluir el catalán, el gallego y el inglés utilizando módulos disponibles en código abierto. El sistema resultante, denominado AhoTTS multilingüe, ha sido liberado en código abierto y ya está siendo utilizado en aplicaciones reales.This paper presents a text-to-speech system based on statistical synthesis which, for the first time, allows generating speech in any of the four official languages of Spain as well as English. Using the AhoTTS system already developed for Spanish and Basque as a starting point, we have added support for Catalan, Galician and English using the code of available open-source modules. The resulting system, named multilingual AhoTTS, has also been released as open-source and it is already being used in real applications

    BUCEADOR hybrid TTS for blizzard challenge 2011

    No full text
    This paper describes the Text-to-Speech (TTS) systems presented by the Buceador Consortium in the Blizzard Challenge 2011 evaluation campaign. The main system is a concatenative hybrid one that tries to combine the strong points of both statistical and unit selection synthesis (i.e. robustness and segmental naturalness respectively). The hybrid system has reached results significantly above average as far as similarity and naturalness are concerned, with no significant differences with most of the systems in the intelligibility task. This clearly improves the performance achieved in previous participations, and shows the validity of the hybrid approach proposed. Besides, an HMM-based system was built for the ES1 intelligibility tasks, using an HNM-based vocoder.Peer Reviewe

    Automatic speaker recognition as a measurement of voice imitation and conversion

    No full text
    Voices can be deliberately disguised by means of human imitation or voice conversion. The question arises to what extent they can be modified by using either method. In the current paper, a set of speaker identification experiments are conducted; first, analysing some prosodic features extracted from voices of professional impersonators attempting to mimic a target voice and, second, using both intragender and crossgender converted voices in a spectral-based speaker recognition system. The results obtained in the current experiments show that the identification error rate increases when testing with imitated voices, as well as when using converted voices, especially the crossgender conversions

    BUCEADOR hybrid TTS for blizzard challenge 2011

    No full text
    This paper describes the Text-to-Speech (TTS) systems presented by the Buceador Consortium in the Blizzard Challenge 2011 evaluation campaign. The main system is a concatenative hybrid one that tries to combine the strong points of both statistical and unit selection synthesis (i.e. robustness and segmental naturalness respectively). The hybrid system has reached results significantly above average as far as similarity and naturalness are concerned, with no significant differences with most of the systems in the intelligibility task. This clearly improves the performance achieved in previous participations, and shows the validity of the hybrid approach proposed. Besides, an HMM-based system was built for the ES1 intelligibility tasks, using an HNM-based vocoder.Peer Reviewe
    corecore