Search CORE

687 research outputs found

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Author: Agiomyrgiannakis Yannis
Chen Zhifeng
Jaitly Navdeep
Pang Ruoming
Saurous Rif A.
Schuster Mike
Shen Jonathan
Skerry-Ryan RJ
Wang Yuxuan
Weiss Ron J.
Wu Yonghui
Yang Zongheng
Zhang Yu
Publication venue
Publication date: 15/02/2018
Field of study

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of

4.53

comparable to a MOS of

4.58

for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and

F_0

features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

AM/FM Dafx

Author: Goulart Antonio
Lazzarini Victor
Timoney Joseph
Publication venue
Publication date: 01/12/2015
Field of study

In this work we explore audio effects based on the manipulation of estimated AM/FM decomposition of input signals, followed by resynthesis. The framework is based on an incoherent monocomponent based decomposition. Contrary to reports that discourage the usage of this simple scenario, our results have shown that the artefacts introduced in the audio produced are acceptable and not even noticeable in some cases. Useful and musically interesting effects were obtained in this study, illustrated with audio samples that accompany the text. We also make available Octave code for future experiments and new Csound opcodes for real-time implementations

MURAL - Maynooth University Research Archive Library

A silent speech system based on permanent magnet articulography and direct synthesis

Author: Bai Jie
Cheah Lam A.
Ell Stephen R.
Gilbert James M.
Gonzalez Jose A.
Green Phil D.
Moore Roger K.
Publication venue: 'Elsevier BV'
Publication date: 14/03/2016
Field of study

In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

Repository@Hull - Worktribe

Crossref