4 research outputs found
Nondestructive Testing (NDT)
The aim of this book is to collect the newest contributions by eminent authors in the field of NDT-SHM, both at the material and structure scale. It therefore provides novel insight at experimental and numerical levels on the application of NDT to a wide variety of materials (concrete, steel, masonry, composites, etc.) in the field of Civil Engineering and Architecture
Prosody generation for text-to-speech synthesis
The absence of convincing intonation makes current parametric speech
synthesis systems sound dull and lifeless, even when trained on expressive
speech data. Typically, these systems use regression techniques to predict the
fundamental frequency (F0) frame-by-frame. This approach leads to overlysmooth
pitch contours and fails to construct an appropriate prosodic structure
across the full utterance. In order to capture and reproduce larger-scale
pitch patterns, we propose a template-based approach for automatic F0 generation,
where per-syllable pitch-contour templates (from a small, automatically
learned set) are predicted by a recurrent neural network (RNN). The use of
syllable templates mitigates the over-smoothing problem and is able to reproduce
pitch patterns observed in the data. The use of an RNN, paired with connectionist
temporal classification (CTC), enables the prediction of structure in
the pitch contour spanning the entire utterance. This novel F0 prediction system
is used alongside separate LSTMs for predicting phone durations and the
other acoustic features, to construct a complete text-to-speech system. Later,
we investigate the benefits of including long-range dependencies in duration
prediction at frame-level using uni-directional recurrent neural networks.
Since prosody is a supra-segmental property, we consider an alternate approach
to intonation generation which exploits long-term dependencies of
F0 by effective modelling of linguistic features using recurrent neural networks.
For this purpose, we propose a hierarchical encoder-decoder and
multi-resolution parallel encoder where the encoder takes word and higher
level linguistic features at the input and upsamples them to phone-level
through a series of hidden layers and is integrated into a Hybrid system which
is then submitted to Blizzard challenge workshop. We then highlight some of
the issues in current approaches and a plan for future directions of investigation
is outlined along with on-going work