6,310 research outputs found
Combining vocal tract length normalization with hierarchial linear transformations
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original av-erage voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper pro-poses that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian frame-work, where VTLN is used as the prior information. A novel tech-nique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regres-sion (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity. Index Terms — Statistical parametric speech synthesis, hidden Markov models, speaker adaptation, vocal tract length normaliza-tion, constrained structural maximum a posteriori linear regression 1
Singing voice correction using canonical time warping
Expressive singing voice correction is an appealing but challenging problem.
A robust time-warping algorithm which synchronizes two singing recordings can
provide a promising solution. We thereby propose to address the problem by
canonical time warping (CTW) which aligns amateur singing recordings to
professional ones. A new pitch contour is generated given the alignment
information, and a pitch-corrected singing is synthesized back through the
vocoder. The objective evaluation shows that CTW is robust against
pitch-shifting and time-stretching effects, and the subjective test
demonstrates that CTW prevails the other methods including DTW and the
commercial auto-tuning software. Finally, we demonstrate the applicability of
the proposed method in a practical, real-world scenario
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification
For practical automatic speaker verification (ASV) systems, replay attack
poses a true risk. By replaying a pre-recorded speech signal of the genuine
speaker, ASV systems tend to be easily fooled. An effective replay detection
method is therefore highly desirable. In this study, we investigate a major
difficulty in replay detection: the over-fitting problem caused by variability
factors in speech signal. An F-ratio probing tool is proposed and three
variability factors are investigated using this tool: speaker identity, speech
content and playback & recording device. The analysis shows that device is the
most influential factor that contributes the highest over-fitting risk. A
frequency warping approach is studied to alleviate the over-fitting problem, as
verified on the ASV-spoof 2017 database
Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees
This paper proposes a voice morphing system for people suffering from
Laryngectomy, which is the surgical removal of all or part of the larynx or the
voice box, particularly performed in cases of laryngeal cancer. A primitive
method of achieving voice morphing is by extracting the source's vocal
coefficients and then converting them into the target speaker's vocal
parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping
the coefficients from source to destination. However, the use of the
traditional/conventional GMM-based mapping approach results in the problem of
over-smoothening of the converted voice. Thus, we hereby propose a unique
method to perform efficient voice morphing and conversion based on GMM,which
overcomes the traditional-method effects of over-smoothening. It uses a
technique of glottal waveform separation and prediction of excitations and
hence the result shows that not only over-smoothening is eliminated but also
the transformed vocal tract parameters match with the target. Moreover, the
synthesized speech thus obtained is found to be of a sufficiently high quality.
Thus, voice morphing based on a unique GMM approach has been proposed and also
critically evaluated based on various subjective and objective evaluation
parameters. Further, an application of voice morphing for Laryngectomees which
deploys this unique approach has been recommended by this paper.Comment: 6 pages, 4 figures, 4 tables; International Journal of Computer
Applications Volume 49, Number 21, July 201
- …