Search CORE

3 research outputs found

THE ROLE OF GLOTTAL SOURCE PARAMETERS FOR HIGH-QUALITY TRANSFORMATION OF PERCEPTUAL AGE

Author: Axel Roebel
Gilles Degottex
Nicolas Obin
Xavier Favory
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT The intuitive control of voice transformation (e.g., age/sex, emotions) is useful to extend the expressive repertoire of a voice. This paper explores the role of glottal source parameters for the control of voice transformation. First, the SVLN speech synthesizer (Separation of the Vocal-tract with the Liljencrants-fant model plus Noise) is used to represent the glottal source parameters (and thus, voice quality) during speech analysis and synthesis. Then, a simple statistical method is presented to control speech parameters during voice transformation : a GMM is used to model the speech parameters of a voice, and regressions are then used to adapt the GMMs statistics (mean and variance) to a control parameter (e.g., age/sex, emotions). A subjective experiment conducted on the control of perceptual age proves the importance of the glottal source parameters for the control of voice transformation, and shows the efficiency of the statistical model to control voice parameters while preserving a high-quality of the voice transformation

CiteSeerX

Statistical Approach to Voice Quality Control in Esophageal Speech Enhancement

Author: Hironori Doi
Hiroshi Saruwatari
Kenzo Yamamoto
Kiyohiro Shikano
Tomoki Toda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

ICASSP2012: The 37th International Conference on Acoustics, Speech, and Signal Processing, March 25-30, Kyoto, Japan.This paper describes a voice quality control method in statistical esophageal speech enhancement. Esophageal speech is produced by one of the alternative speaking methods for laryngectomees. Its naturalness and intelligibility are much lower than those of natural voices and its voice quality sounds similar even if uttered by different laryngectomees. These issues are alleviated by a statistical voice conversion method from esophageal speech into normal speech (ES-to-Speech) based on eigenvoices. This method is capable of determining converted voice quality using a few target voice samples. In this paper, we propose ES-to-Speech using regression techniques to make it possible to manually control the converted voice quality by manipulating a few intuitively controllable parameters even if no target voice sample is available. The effectiveness of the proposed method is confirmed by experimental evaluations

NAIST Academic Repository

Crossref

Statistical Approach to Voice Quality Control in Esophageal Speech Enhancement

Author: Hironori Doi
Hiroshi Saruwatari
Kenzo Yamamoto
Kiyohiro Shikano
Tomoki Toda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/03/2023
Field of study

Institutional Repositories DataBase (IRDB)