1,929 research outputs found

    Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis

    Full text link
    The purpose of this study is to investigate how humans interpret musical scores expressively, and then design machines that sing like humans. We consider six factors that have a strong influence on the expression of human singing. The factors are related to the acoustic, phonetic, and musical features of a real singing signal. Given real singing voices recorded following the MIDI scores and lyrics, our analysis module can extract the expression parameters from the real singing signals semi-automatically. The expression parameters are used to control the singing voice synthesis (SVS) system for Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The results of perceptual experiments show that integrating the expression factors into the SVS system yields a notable improvement in perceptual naturalness, clearness, and expressiveness. By one-to-one mapping of the real singing signal and expression controls to the synthesizer, our SVS system can simulate the interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor

    Prosodic challenges faced by English speakers reading Mandarin

    Get PDF
    This study compares the prosodic characteristics of L2-Mandarin as spoken by L1-English speakers using L1-Mandarin utterances. The acoustic correlates examined include individual tonal realizations, interactions of tones in sequence, durational features and intensity envelopes. L2-Mandarin users realize the contour tones RISE and FALL with both rising and falling pitch, and produce the second tone of disyllabic words with more varied pitch. L2-users employ larger vowel durations, syllable durations and larger variation over vowel intervals in sequential pairs than L1-Mandarin users. Both user groups show similar intensity envelopes. Implications of this study include tailoring language training programs that counterbalance L1 influences

    Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese

    Full text link
    Mandarin Chinese is characterized by being a tonal language; the pitch (or F0F_0) of its utterances carries considerable linguistic information. However, speech samples from different individuals are subject to changes in amplitude and phase which must be accounted for in any analysis which attempts to provide a linguistically meaningful description of the language. A joint model for amplitude, phase and duration is presented which combines elements from Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects Models. By decomposing functions via a functional principal component analysis, and connecting registration functions to compositional data analysis, a joint multivariate mixed effect model can be formulated which gives insights into the relationship between the different modes of variation as well as their dependence on linguistic and non-linguistic covariates. The model is applied to the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin, containing approximately 50 thousand phonetically diverse sample F0F_0 contours (syllables), and reveals that phonetic information is jointly carried by both amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio

    Automatic prosodic variations modelling for language and dialect discrimination

    Get PDF
    International audienceThis paper addresses the problem of modelling prosody for language identification. The aim is to create a system that can be used prior to any linguistic work to show if prosodic differences among languages or dialects can be automatically determined. In previous papers, we defined a prosodic unit, the pseudo-syllable. Rhythmic modelling has proven the relevance of the pseudo-syllable unit for automatic language identification. In this paper, we propose to model the prosodic variations, that is to say model sequences of prosodic units. This is achieved by the separation of phrase and accentual components of intonation. We propose an independent coding of those components on differentiated scales of duration. Short-term and long-term language-dependent sequences of labels are modelled by n-gram models. The performance of the system is demonstrated by experiments on read speech and evaluated by experiments on spontaneous speech. Finally, an experiment is described on the discrimination of Arabic dialects, for which there is a lack of linguistic studies, notably on prosodic comparisons. We show that our system is able to clearly identify the dialectal areas, leading to the hypothesis that those dialects have prosodic differences

    The influence of pitch contour on Mandarin speakers\u27 perception of English stress

    Get PDF
    Previous studies on L2 stress perception have mainly focused on words in isolation or in single intonational contexts. This paper reports on a study exploring the influence of different intonation contours, falling (declarative) and rising (yes/no question), on nonnative speakers\u27 stress identification. The study compared the perception of stress position in English words by native speakers of Mandarin, a tone language, and English, a stress language. The results showed that Mandarin speakers exhibited misperception of stress position when high tones did not coincide with the stressed syllable. As a control condition, native English speakers also displayed misperception of stress, but to a lesser extent in the condition of initial stress. Tonal transfer and asymmetrical cue usage are believed to be responsible for the perceptual differences
    • 

    corecore