50 research outputs found

    Assessing respiratory contributions to f0 declination in German across varying speech tasks and respiratory demands

    No full text
    International audienceMany past studies have sought to determine the factors that affect f0 declination, and the physiological underpinnings of the phenomenon. This study assessed the relation between respiration and f0 declination by means of simultaneous acoustic and respiratory recordings from read and spontaneous speech from speakers of German. Within the respective Intonational Phrase unit, we analysed the effect of the number of syllables and voiceless obstruents. Both factors could influence the slope of either f0 declination or rib cage movement. If respiration and f0 declination are related physiologically, their relationship might also be modulated by either one or both factors. Our results show consistently for both speech tasks that the slope of the rib cage movement is not related with f0 declination when length and consonant content vary. Furthermore f0 slopes are generally shallower in spontaneous than in read speech. Finally, although a higher number of voiceless obstruents yielded a greater rib cage compression, it did not affect f0 declination. These results suggest that although f0 declination occurs in many languages, it might not have a purely physiological origin in breathing, but rather reflects cognitive processing which allows speakers to look ahead when planning their utterances

    Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features

    Full text link
    In this paper, we explore the use of prosodic features in sen-tence boundary detection in Chinese broadcast news. The prosodic features include speaker turn, music, pause dura-tion, pitch, energy and speaking rate. Specifically, consider-ing the Chinese tonal effects in pitch trajectory, we propose to use tone-normalized pitch features. Experiments using deci-sion trees demonstrate that the tone-normalized pitch features show superior performance in sentence boundary detection in Chinese broadcast news. Furthermore, feature combination is able to achieve apparent performance improvement by in-tuitive feature interactive rules formed in the decision tree. Pause duration and a tone-normalized pitch feature contribute the most part of the feature usage in the best-performing de-cision tree. Index Terms — sentence boundary detection, sentence segmentation, speech prosody, rich transcription 1

    Cross-Linguistic Comparison of the Pitch and Temporal Profiles between L1 and Chinese L2 Speakers of Spanish

    Get PDF
    Cross-linguistic studies between intonational languages suggest that there is a universal trend during the L2 learning process regarding pitch and temporal characteristics. We extend these hypotheses to Chinese learners of Peninsular Spanish-a new pairing of tone and non-tone languages. Using six pitch and temporal metrics, we examine how Chinese learners’ pitch and temporal profiles deviated from those of L1 native speakers and explore the factors that may contribute to L2 speech deviations. The Discourse Completion Task was conducted to elicit five question types produced by 37 participants, who were divided into three language groups. Consistent with previous literature, our study shows that Chinese L2 learners had a compression of pitch span (at both the utterance and syllable levels) and pitch variability, as well as a strong reduction of pitch change rate, speech rate, and articulation rate compared to L1 Spanish speakers. Most pitch and temporal deviations in L2 Spanish intonation are closely linked to psychological and cognitive attributes rather than being determined by physiological factors or L1 tonal transfer. Moreover, the lack of prosodic knowledge of the target intonation patterns concerning the different question types may also hinder L2 learners from approaching a native-like pitch and temporal profile.Algunos estudios interlingüísticos entre lenguas entonativas sugieren que puede existir una tendencia universal durante el proceso de aprendizaje de la L2 con respecto a las características tonales y temporales. Extendemos estas hipótesis a los aprendices chinos de español peninsular -una nueva combinación lingüística entre lenguas tonales y entonativas. Usando seis métricas tonales y temporales, pretendemos examinar cómo los aprendices chinos se desvían de los hablantes nativos en los perfiles tonales y temporales, y explorar los factores que contribuyen a las desviaciones en el habla de la L2. Se ha realizado la Tarea de Finalización del Discurso para elicitar cinco tipos de preguntas producidas por los 37 participantes divididos en tres grupos lingüísticos. En línea con la bibliografía anterior, nuestro estudio muestra que los aprendices chinos presentaban una compresión de rango tonal (tanto a nivel oracional como a nivel silábico) y variación tonal, así como una reducción significativa en la tasa del cambio tonal, la velocidad del habla y la tasa de articulación en comparación con los hablantes nativos de español. La mayoría de las desviaciones tonales y temporales en la entonación de la L2 están estrechamente relacionadas con atributos psicológicos y cognitivos más que con factores fisiológicos o con la transferencia tonal de la L1. Además, la falta de conocimiento prosódico de los patrones entonativos relativos a los diferentes tipos de preguntas en la lengua meta también impide que los aprendices de L2 se asimilen a un perfil tonal y temporal similar a los nativos

    Juncture prosody across languages: Similar production but dissimilar perception

    Get PDF
    How do speakers of languages with different intonation systems produce and perceive prosodic junctures in sentences with identical structural ambiguity? Native speakers of English and of Mandarin produced potentially ambiguous sentences with a prosodic juncture either earlier in the utterance (e.g., “He gave her # dog biscuits,” “他给她#狗饼干 ”), or later (e.g., “He gave her dog # biscuits,” “他给她狗 #饼干 ”). These productiondata showed that prosodic disambiguation is realised very similarly in the two languages, despite some differences in the degree to which individual juncture cues (e.g., pausing) were favoured. In perception experiments with a new disambiguation task, requiring speeded responses to select the correct meaning for structurally ambiguous sentences, language differences in disambiguation response time appeared: Mandarin speakers correctly disambiguated sentences with earlier juncture faster than those with later juncture, while English speakers showed the reverse. Mandarin-speakers with L2 English did not show their native-language response time pattern when they heard the English ambiguous sentences. Thus even with identical structural ambiguity and identically cued production, prosodic juncture perception across languages can differ

    Juncture prosody across languages : similar production but dissimilar perception

    Get PDF
    How do speakers of languages with different intonation systems produce and perceive prosodic junctures in sentences with identical structural ambiguity? Native speakers of English and of Mandarin produced potentially ambiguous sentences with a prosodic juncture either earlier in the utterance (e.g., “He gave her # dog biscuits,” “他给她 # 狗饼干”), or later (e.g., “He gave her dog # biscuits,” “他给她狗 # 饼干”). These production data showed that prosodic disambiguation is realized very similarly in the two languages, despite some differences in the degree to which individual juncture cues (e.g., pausing) were favoured. In perception experiments with a new disambiguation task, requiring speeded responses to select the correct meaning for structurally ambiguous sentences, language differences in disambiguation response time appeared: Mandarin speakers correctly disambiguated sentences with earlier juncture faster than those with later juncture, while English speakers showed the reverse. Mandarin speakers also showed higher levels of accuracy in disambiguation compared to English speakers, indicating language-specific differences in the extent to which prosodic cues are used. However, Mandarin, but not English, speakers showed a decrease in accuracy when pausing cues were removed. Thus even with high similarity in both structural ambiguity and production cues, prosodic juncture perception across languages can differ

    Paragraph-based Prosodic Cues for Speech Synthesis Applications

    Get PDF
    Paper presented at: Speech Prosody 2016; 2016 May 31-June 3; Boston (MA, USA)Speech synthesis has improved in both expressiveness and voice quality in recent years. However, obtaining full expressiveness when dealing with large multi-sentential synthesized discourse is still a challenge, since speech synthesizers do not take into account the prosodic differences that have been observed in discourse units such as paragraphs. The current study validates and extends previous work by analyzing the prosody of paragraph units in a large and diverse corpus of TED Talks using automatically extracted F0, intensity and timing features. In addition, a series of classification experiments was performed in order to identify which features are consistently used to distinguish paragraph breaks. The results show significant differences in prosody related to paragraph position. Moreover, the classification experiments show that boundary features such as pause duration and differences in F0 and intensity levels are the most consistent cues in marking paragraph boundaries. This suggests that these features should be taken into account when generating spoken discourse in order to improve naturalness and expressiveness.Part of this work has received funding from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012. The first author is partially funded by the Spanish Ministry of Economy and Competitivity through the Juan de la Cierva program and a Jos´e Castillejo mobility gran

    Integrating lexical and prosodic features for automatic paragraph segmentation

    Get PDF
    Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that integrate representations generated by separate lexical and prosodic models while allowing interactions between these features streams rather than treating them as independent information sources. Application to ASR outputs shows that adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to transcription errors.The second author was funded from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012 and the Spanish Ministry of Economy and Competitivity Juan de la Cierva program. The other authors were funded by the University of Edinburgh

    Segmentation and Intonation in Childhood Apraxia of Speech

    Get PDF
    Childhood apraxia of speech (CAS) is a motor speech disorder that affects the programming of spatial and temporal parameters for speech patterns, characterized by sound distortions, segmented units, and deficits with lexical stress. CAS has notable increases in the length of time between speech segments and within syllables than do children with phonological impairments or who are developing typically. This segmentation may impact prosody at the lexical level. Prosody also includes declination of the fundamental frequency (F0) and reset at the intonational level, impacting the intelligibility of speech production. This study assessed segmentation and intonational effects on prosody across an entire utterance for 11 children with CAS and 10 typically-developing children (TD) aged 5-11-years-old. Acoustic analyses of real and non-word multisyllabic words, paired with a carrier phrase of 3-4 words, were conducted for the average inter-segment duration between and within words (ms) and average slope of F0. Stimuli were generated from Treating Establishment of Motor Program Organization (TEMPO), which targets motor speech errors in CAS (Miller et al., 2018). The current study provides a TD comparison to the CAS group from the TEMPO study prior to treatment. Results showed CAS participants produced significantly longer inter-segment durations between words and within words. A correlation analysis concluded a strong positive relationship between inter-segment duration and number of words in the sentence. These data found F0 declination over utterances for both groups, with no detectable difference between groups. F0 change over target words did not show notable declination differences between groups. Further correlation testing suggests that as between-word duration increases, F0 regression slope over utterances flattens. Comparing speech production patterns in CAS with TD children pre- and post-treatment will better establish treatment efficacy in improving the communication of children with CAS. Further data from additional participants will better differentiate prosodic intonation between TD children and children with CAS
    corecore