53 research outputs found

    Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features

    Full text link
    In this paper, we explore the use of prosodic features in sen-tence boundary detection in Chinese broadcast news. The prosodic features include speaker turn, music, pause dura-tion, pitch, energy and speaking rate. Specifically, consider-ing the Chinese tonal effects in pitch trajectory, we propose to use tone-normalized pitch features. Experiments using deci-sion trees demonstrate that the tone-normalized pitch features show superior performance in sentence boundary detection in Chinese broadcast news. Furthermore, feature combination is able to achieve apparent performance improvement by in-tuitive feature interactive rules formed in the decision tree. Pause duration and a tone-normalized pitch feature contribute the most part of the feature usage in the best-performing de-cision tree. Index Terms — sentence boundary detection, sentence segmentation, speech prosody, rich transcription 1

    Assessing respiratory contributions to f0 declination in German across varying speech tasks and respiratory demands

    No full text
    International audienceMany past studies have sought to determine the factors that affect f0 declination, and the physiological underpinnings of the phenomenon. This study assessed the relation between respiration and f0 declination by means of simultaneous acoustic and respiratory recordings from read and spontaneous speech from speakers of German. Within the respective Intonational Phrase unit, we analysed the effect of the number of syllables and voiceless obstruents. Both factors could influence the slope of either f0 declination or rib cage movement. If respiration and f0 declination are related physiologically, their relationship might also be modulated by either one or both factors. Our results show consistently for both speech tasks that the slope of the rib cage movement is not related with f0 declination when length and consonant content vary. Furthermore f0 slopes are generally shallower in spontaneous than in read speech. Finally, although a higher number of voiceless obstruents yielded a greater rib cage compression, it did not affect f0 declination. These results suggest that although f0 declination occurs in many languages, it might not have a purely physiological origin in breathing, but rather reflects cognitive processing which allows speakers to look ahead when planning their utterances

    Investigating the tonal system of Plastic Mandarin: a cross-varietal comparison

    Get PDF
    The city of Changsha, Hunan Province, China has seen an increase in the use of Mandarin in the past decade, overshadowing the local non-Mandarin variety, Changsha. A new variety “Plastic Mandarin”, mostly spoken by millennials and younger generations, has emerged. It is defined in this thesis as a non-standard Mandarin accent that features the speech of young urban residents in Changsha and that has crystallised over the past few decades. This thesis presents a detailed phonetic investigation of the tonal system of Plastic Mandarin through a cross-varietal comparative approach, mainly divided into two streams: citation tones and neutral tones in contexts. The defining characteristic of the citation tone system for Plastic Mandarin is established first: a mid-level tone, a low to mid rising tone, a low falling tone, and a high rising tone. By comparing the citation tones of the three varieties that coexist in the city of Changsha, the thesis provides acoustic evidence that Plastic Mandarin may arise when Mandarin tones adapt the pitch pattern of some corresponding Changsha tones. In addition to citation tones, this thesis disentangles the sources of variability in the syllable duration and f0 contour of speech sequences containing neutral tone syllables, i.e. those do not have any of the four canonical lexical tones and often overlooked in prior studies of tones. The data show that f0 contours converge at the end of two consecutive neutral tone syllables at a low pitch in both Mandarin varieties. It suggests that a neutral tone or a sequence of consecutive neutral tones tends to be associated with a low pitch target, despite the varying f0 shapes largely predicted by the preceding lexical tone. The thesis proposes a probabilistic target-approaching model for Mandarin tones in connected speech, in which pitch targets may be fewer than the number of syllables. While the phonetic realisation of the four lexical tones in Plastic Mandarin is consistently different from that in Standard Mandarin, the pitch target of neutral tone syllables tends to remain constant in this process of Mandarin variation and change, which may be attributed to the stable transfer of prosodic structure

    Cross-Linguistic Comparison of the Pitch and Temporal Profiles between L1 and Chinese L2 Speakers of Spanish

    Get PDF
    Cross-linguistic studies between intonational languages suggest that there is a universal trend during the L2 learning process regarding pitch and temporal characteristics. We extend these hypotheses to Chinese learners of Peninsular Spanish-a new pairing of tone and non-tone languages. Using six pitch and temporal metrics, we examine how Chinese learners’ pitch and temporal profiles deviated from those of L1 native speakers and explore the factors that may contribute to L2 speech deviations. The Discourse Completion Task was conducted to elicit five question types produced by 37 participants, who were divided into three language groups. Consistent with previous literature, our study shows that Chinese L2 learners had a compression of pitch span (at both the utterance and syllable levels) and pitch variability, as well as a strong reduction of pitch change rate, speech rate, and articulation rate compared to L1 Spanish speakers. Most pitch and temporal deviations in L2 Spanish intonation are closely linked to psychological and cognitive attributes rather than being determined by physiological factors or L1 tonal transfer. Moreover, the lack of prosodic knowledge of the target intonation patterns concerning the different question types may also hinder L2 learners from approaching a native-like pitch and temporal profile.Algunos estudios interlingüísticos entre lenguas entonativas sugieren que puede existir una tendencia universal durante el proceso de aprendizaje de la L2 con respecto a las características tonales y temporales. Extendemos estas hipótesis a los aprendices chinos de español peninsular -una nueva combinación lingüística entre lenguas tonales y entonativas. Usando seis métricas tonales y temporales, pretendemos examinar cómo los aprendices chinos se desvían de los hablantes nativos en los perfiles tonales y temporales, y explorar los factores que contribuyen a las desviaciones en el habla de la L2. Se ha realizado la Tarea de Finalización del Discurso para elicitar cinco tipos de preguntas producidas por los 37 participantes divididos en tres grupos lingüísticos. En línea con la bibliografía anterior, nuestro estudio muestra que los aprendices chinos presentaban una compresión de rango tonal (tanto a nivel oracional como a nivel silábico) y variación tonal, así como una reducción significativa en la tasa del cambio tonal, la velocidad del habla y la tasa de articulación en comparación con los hablantes nativos de español. La mayoría de las desviaciones tonales y temporales en la entonación de la L2 están estrechamente relacionadas con atributos psicológicos y cognitivos más que con factores fisiológicos o con la transferencia tonal de la L1. Además, la falta de conocimiento prosódico de los patrones entonativos relativos a los diferentes tipos de preguntas en la lengua meta también impide que los aprendices de L2 se asimilen a un perfil tonal y temporal similar a los nativos

    Juncture prosody across languages: Similar production but dissimilar perception

    Get PDF
    How do speakers of languages with different intonation systems produce and perceive prosodic junctures in sentences with identical structural ambiguity? Native speakers of English and of Mandarin produced potentially ambiguous sentences with a prosodic juncture either earlier in the utterance (e.g., “He gave her # dog biscuits,” “他给她#狗饼干 ”), or later (e.g., “He gave her dog # biscuits,” “他给她狗 #饼干 ”). These productiondata showed that prosodic disambiguation is realised very similarly in the two languages, despite some differences in the degree to which individual juncture cues (e.g., pausing) were favoured. In perception experiments with a new disambiguation task, requiring speeded responses to select the correct meaning for structurally ambiguous sentences, language differences in disambiguation response time appeared: Mandarin speakers correctly disambiguated sentences with earlier juncture faster than those with later juncture, while English speakers showed the reverse. Mandarin-speakers with L2 English did not show their native-language response time pattern when they heard the English ambiguous sentences. Thus even with identical structural ambiguity and identically cued production, prosodic juncture perception across languages can differ

    Juncture prosody across languages : similar production but dissimilar perception

    Get PDF
    How do speakers of languages with different intonation systems produce and perceive prosodic junctures in sentences with identical structural ambiguity? Native speakers of English and of Mandarin produced potentially ambiguous sentences with a prosodic juncture either earlier in the utterance (e.g., “He gave her # dog biscuits,” “他给她 # 狗饼干”), or later (e.g., “He gave her dog # biscuits,” “他给她狗 # 饼干”). These production data showed that prosodic disambiguation is realized very similarly in the two languages, despite some differences in the degree to which individual juncture cues (e.g., pausing) were favoured. In perception experiments with a new disambiguation task, requiring speeded responses to select the correct meaning for structurally ambiguous sentences, language differences in disambiguation response time appeared: Mandarin speakers correctly disambiguated sentences with earlier juncture faster than those with later juncture, while English speakers showed the reverse. Mandarin speakers also showed higher levels of accuracy in disambiguation compared to English speakers, indicating language-specific differences in the extent to which prosodic cues are used. However, Mandarin, but not English, speakers showed a decrease in accuracy when pausing cues were removed. Thus even with high similarity in both structural ambiguity and production cues, prosodic juncture perception across languages can differ

    Paragraph-based Prosodic Cues for Speech Synthesis Applications

    Get PDF
    Paper presented at: Speech Prosody 2016; 2016 May 31-June 3; Boston (MA, USA)Speech synthesis has improved in both expressiveness and voice quality in recent years. However, obtaining full expressiveness when dealing with large multi-sentential synthesized discourse is still a challenge, since speech synthesizers do not take into account the prosodic differences that have been observed in discourse units such as paragraphs. The current study validates and extends previous work by analyzing the prosody of paragraph units in a large and diverse corpus of TED Talks using automatically extracted F0, intensity and timing features. In addition, a series of classification experiments was performed in order to identify which features are consistently used to distinguish paragraph breaks. The results show significant differences in prosody related to paragraph position. Moreover, the classification experiments show that boundary features such as pause duration and differences in F0 and intensity levels are the most consistent cues in marking paragraph boundaries. This suggests that these features should be taken into account when generating spoken discourse in order to improve naturalness and expressiveness.Part of this work has received funding from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012. The first author is partially funded by the Spanish Ministry of Economy and Competitivity through the Juan de la Cierva program and a Jos´e Castillejo mobility gran

    Integrating lexical and prosodic features for automatic paragraph segmentation

    Get PDF
    Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that integrate representations generated by separate lexical and prosodic models while allowing interactions between these features streams rather than treating them as independent information sources. Application to ASR outputs shows that adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to transcription errors.The second author was funded from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012 and the Spanish Ministry of Economy and Competitivity Juan de la Cierva program. The other authors were funded by the University of Edinburgh
    corecore