5 research outputs found

    TRAJECTORY TRAINING CONSIDERING GLOBAL VARIANCE FOR HMM-BASED SPEECH SYNTHESIS

    No full text
    This paper presents a novel method for training hidden Markov models (HMMs) for use in HMM-based speech synthesis. The primary goal of HMM parameter optimization is to ensure that parameters generated from the trained models exhibit similar properties to natural speech. In this paper, two major problems in conventional training are addressed: 1) the inconsistency between the training and synthesis optimization criterion; and 2) the over-smoothing caused by the statistical modeling process. The proposed method integrates the global variance (GV) criterion into a trajectory training method to give a unified framework for both training and synthesis which provides both a consistent optimization criterion and a closed form solution for parameter generation. The experimental results demonstrate that the proposed method yields a significant improvement in the naturalness of synthetic speech. Index Terms — speech synthesis, hidden Markov models, training criterion, trajectory likelihood, global variance 1

    Trajectory training considering global variance for HMM-based speech synthesis

    No full text

    Trajectory Training Considering Global Variance for HMM-based Speech Synthesis

    No full text

    Simulación de voces a través de un conversor texto-voz basado en modelos ocultos de Markov

    Get PDF
    Una parte importante de los sistemas de inteligencia ambiental la constituye el interfaz hombre-máquina, y dentro de este la síntesis de voz. La síntesis de voz consiste en la producción artificial de voz humana. Los principales retos de los conversores texto-voz son la producción de una voz artificial inteligible y natural, la completa automatización del proceso y que el texto necesario para la síntesis no provenga de una modificación del lenguaje original. A lo largo de este proyecto se ha puesto en marcha un sistema completo de conversión texto-voz de última generación basado en la síntesis de voz por modelos ocultos de Markov. Para llevarlo a cabo se han empleado algoritmos de adaptación de modelos acústicos, concretamente Maximum A Posteriori y Maximum Likelihood Linear Regression. Estos algoritmos permiten obtener una voz sintetizada a partir de pocas muestras de voz y no fonéticamente balanceadas del locutor deseado, pues utilizan como base otros registros que sí están fonéticamente balanceados entrenados previamente para la síntesis. Para realizar este proceso de conversión texto-voz se ha elaborado una base de datos, tanto de un locutor genérico como del locutor a adaptar, y su representación escrita. Se ha realizado un proceso de entrenamiento, consistente en la elaboración de los modelos acústicos empleados en la síntesis, aplicando distintos algoritmos para el cálculo de los modelos. Finalmente se han aplicado los algoritmos adaptativos descritos anteriormente. Una vez obtenidos los modelos acústicos se ha procedido a generar voz artificial siguiendo el modelo digital de producción del habla, excitación más filtro. El resultado del proceso es una voz artificial que busca asemejarse a la voz original, semejanza que se ha evaluado mediante programación dinámica. Por último, se ha elaborado una aplicación web que, sirviéndose del sistema de síntesis elaborado, servirá para crear un banco de voces de los usuarios que la empleen. Human-Machine Interface is an important part of the Ambient Intelligence Systems, and in particular the Text-to-Speech (TTS) systems. TTS consists of an artificial human voice production. The aims of TTS systems are: the production of a synthesized intelligible and natural voice, and the complete automation of the process. Moreover the text to synthesize doesn’t come from a change of the original language. During this project, a last generation complete TTS system based on Hidden Markov Models has been developed. In order to perform it, adaptation algorithms of acoustic models have been used, specifically Maximum A Posteriori and Maximum Likelihood Linear Regression. These algorithms allow us to obtain an artificial voice from a few not phonetically balanced voice samples of the desired speaker, because phonetically balanced base voice is used. For this TTS conversion process, a database from a generic speaker and the desired speaker has been produced. A training process, which consists of developing acoustic models, by applying different algorithms, has been performed. Finally, the adaptive algorithms described before have been applied. Once the acoustic models have been obtained, an artificial voice has been generated using the digital model of voice production. The result of this process is an artificial voice, which tries to be as similar as possible to the original voice. This similarity has been evaluated by a dynamic programming algorithm. Finally, a web application has been developed to create a voice bank
    corecore