4,575 research outputs found
Using Prosody to Classify Discourse Relations
Comunicació presentada a: The 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), celebrada a Estocolm, Suència, del 20 al 24 d'agost de 2017.This work aims to explore the correlation between the discourse structure of a spoken monologue and its prosody by predicting discourse relations from different prosodic attributes. For this purpose, a corpus of semi-spontaneous monologues in English has been automatically annotated according to the Rhetorical
Structure Theory, which models coherence in text via rhetorical relations. From corresponding audio files, prosodic features such as pitch, intensity, and speech rate have been extracted from different contexts of a relation. Supervised classification tasks using Support Vector Machines have been performed to find relationships between prosodic features and rhetorical relations. Preliminary results show that intensity combined with other features extracted from intra- and intersegmental environments is the feature with the highest predictability for a discourse relation. The prediction of rhetorical relations from prosodic features and their combinations is straightforwardly applicable to several tasks such as speech understanding or generation. Moreover, the knowledge of how rhetorical relations should be marked in terms of prosody will serve as a basis to improve speech synthesis applications and make voices sound more natural and expressive.This work is part of the KRISTINA project, which has received funding from the European Union’s Horizon 2020 Research
and Innovation Programme under the Grant Agreement number 645012. The second author is partially funded by the
Spanish Ministry of Economy, Industry and Competitiveness through the Ramón y Cajal program. The third and fourth authors
are partially funded by ANPCYT PICT 2014-1561, and the Air Force Office of Scientific Research, Air Force Material
Command, USAF under Award No. FA9550-15-1-0055
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
The acquisition of English L2 prosody by Italian native speakers: experimental data and pedagogical implications
This paper investigates Yes-No question intonation patterns in English L2, Italian L1, and
English L1. The aim is to test the hypothesis that L2 learners may show different
acquisition strategies for different dimensions of intonation, and particularly the
phonological and phonetic components. The study analyses the nuclear intonation
contours of 4 target English words and 4 comparable Italian words consisting of sonorant
segments, stressed on the semi-final or final syllable, and occurring in Yes-No questions
in sentence-final position (e.g., Will you attend the memorial?, Hai sentito la Melania?).
The words were contained in mini-dialogues of question-answer pairs, and read 5 times
by 4 Italian speakers (Padova area, North-East Italy) and 3 English female speakers
(London area, UK). The results show that: 1) different intonation patterns may be used to
realize the same grammatical function; 2) different developmental processes are at work,
including transfer of L1 categories and the acquisition of L2 phonological categories.
These results suggest that the phonetic dimension of L2 intonation may be more difficult
to learn than the phonological one
- …