13,677 research outputs found
Sampling-based speech parameter generation using moment-matching networks
This paper presents sampling-based speech parameter generation using
moment-matching networks for Deep Neural Network (DNN)-based speech synthesis.
Although people never produce exactly the same speech even if we try to express
the same linguistic and para-linguistic information, typical statistical speech
synthesis produces completely the same speech, i.e., there is no
inter-utterance variation in synthetic speech. To give synthetic speech natural
inter-utterance variation, this paper builds DNN acoustic models that make it
possible to randomly sample speech parameters. The DNNs are trained so that
they make the moments of generated speech parameters close to those of natural
speech parameters. Since the variation of speech parameters is compressed into
a low-dimensional simple prior noise vector, our algorithm has lower
computation cost than direct sampling of speech parameters. As the first step
towards generating synthetic speech that has natural inter-utterance variation,
this paper investigates whether or not the proposed sampling-based generation
deteriorates synthetic speech quality. In evaluation, we compare speech quality
of conventional maximum likelihood-based generation and proposed sampling-based
generation. The result demonstrates the proposed generation causes no
degradation in speech quality.Comment: Submitted to INTERSPEECH 201
Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis
This paper proposes a forward attention method for the sequenceto- sequence
acoustic modeling of speech synthesis. This method is motivated by the nature
of the monotonic alignment from phone sequences to acoustic sequences. Only the
alignment paths that satisfy the monotonic condition are taken into
consideration at each decoder timestep. The modified attention probabilities at
each timestep are computed recursively using a forward algorithm. A transition
agent for forward attention is further proposed, which helps the attention
mechanism to make decisions whether to move forward or stay at each decoder
timestep. Experimental results show that the proposed forward attention method
achieves faster convergence speed and higher stability than the baseline
attention method. Besides, the method of forward attention with transition
agent can also help improve the naturalness of synthetic speech and control the
speed of synthetic speech effectively.Comment: 5 pages, 3 figures, 2 tables. Published in IEEE International
Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018
- …