12,834 research outputs found
Modelling Pitch Accent Types for Polish Speech Synthesis
We describe a Polish prosody modelling module for the Festival
speech synthesis system. The module uses classification and regression
trees for accent type prediction and a linear regression
technique for F0 contour generation for these contours. The
techniques used to attempt to overcome problems with the only
available data are shown. We demonstrate how improvements
were achieved by the use of a modified F0 stylisation, accent
type clustering and language specific features. Results of a formal
perception study show a significant preference for the new
intonation model over the original one
Prosody generation for text-to-speech synthesis
The absence of convincing intonation makes current parametric speech
synthesis systems sound dull and lifeless, even when trained on expressive
speech data. Typically, these systems use regression techniques to predict the
fundamental frequency (F0) frame-by-frame. This approach leads to overlysmooth
pitch contours and fails to construct an appropriate prosodic structure
across the full utterance. In order to capture and reproduce larger-scale
pitch patterns, we propose a template-based approach for automatic F0 generation,
where per-syllable pitch-contour templates (from a small, automatically
learned set) are predicted by a recurrent neural network (RNN). The use of
syllable templates mitigates the over-smoothing problem and is able to reproduce
pitch patterns observed in the data. The use of an RNN, paired with connectionist
temporal classification (CTC), enables the prediction of structure in
the pitch contour spanning the entire utterance. This novel F0 prediction system
is used alongside separate LSTMs for predicting phone durations and the
other acoustic features, to construct a complete text-to-speech system. Later,
we investigate the benefits of including long-range dependencies in duration
prediction at frame-level using uni-directional recurrent neural networks.
Since prosody is a supra-segmental property, we consider an alternate approach
to intonation generation which exploits long-term dependencies of
F0 by effective modelling of linguistic features using recurrent neural networks.
For this purpose, we propose a hierarchical encoder-decoder and
multi-resolution parallel encoder where the encoder takes word and higher
level linguistic features at the input and upsamples them to phone-level
through a series of hidden layers and is integrated into a Hybrid system which
is then submitted to Blizzard challenge workshop. We then highlight some of
the issues in current approaches and a plan for future directions of investigation
is outlined along with on-going work
Perception of nonnative tonal contrasts by Mandarin-English and English-Mandarin sequential bilinguals
This study examined the role of acquisition order and crosslinguistic similarity in influencing transfer at the initial stage of perceptually acquiring a tonal third language (L3). Perception of tones in Yoruba and Thai was tested in adult sequential bilinguals representing three different first (L1) and second language (L2) backgrounds: L1 Mandarin-L2 English (MEBs), L1 English-L2 Mandarin (EMBs), and L1 English-L2 intonational/non-tonal (EIBs). MEBs outperformed EMBs and EIBs in discriminating L3 tonal contrasts in both languages, while EMBs showed a small advantage over EIBs on Yoruba. All groups showed better overall discrimination in Thai than Yoruba, but group differences were more robust in Yoruba. MEBsâ and EMBsâ poor discrimination of certain L3 contrasts was further reflected in the L3 tones being perceived as similar to the same Mandarin tone; however, EIBs, with no knowledge of Mandarin, showed many of the same similarity judgments. These findings thus suggest that L1 tonal experience has a particularly facilitative effect in L3 tone perception, but there is also a facilitative effect of L2 tonal experience. Further, crosslinguistic perceptual similarity between L1/L2 and L3 tones, as well as acoustic similarity between different L3 tones, play a significant role at this early stage of L3 tone acquisition.Published versio
- âŚ