17,317 research outputs found
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information
In conversational speech, the acoustic signal provides cues that help
listeners disambiguate difficult parses. For automatically parsing spoken
utterances, we introduce a model that integrates transcribed text and
acoustic-prosodic features using a convolutional neural network over energy and
pitch trajectories coupled with an attention-based recurrent neural network
that accepts text and prosodic features. We find that different types of
acoustic-prosodic features are individually helpful, and together give
statistically significant improvements in parse and disfluency detection F1
scores over a strong text-only baseline. For this study with known sentence
boundaries, error analyses show that the main benefit of acoustic-prosodic
features is in sentences with disfluencies, attachment decisions are most
improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201
The emergence of prosody in linguistic theory
Prosody is a unique character in the production of sounds. Human speech is particularly marked by prosody for various functions in the different aspects of linguistics (e.g. phonology, morphology, sociolinguistics). The importance of prosody in human language had been known since very early periods of modern civilisation. Both Western and Eastern traditions had put a lot of emphasis on the proper practice of prosodic rhymes and rhythms in the use of language whether it was for analysing grammar or for praying to God or any other superior spirit. Subsequent developments in linguistics have revealed the central role played by prosody in determining the innate grammar of human language. This paper attempts to discuss in brief the evolution of the thought on prosody and its current standing in the field of linguistics.peer-reviewe
Prosodic description: An introduction for fieldworkers
This article provides an introductory tutorial on prosodic features such as tone and accent for researchers working on little-known languages. It specifically addresses the needs of non-specialists and thus does not presuppose knowledge of the phonetics and phonology of prosodic features. Instead, it intends to introduce the uninitiated reader to a field often shied away from because of its (in part real, but in part also just imagined) complexities. It consists of a concise overview of the basic phonetic phenomena (section 2) and the major categories and problems of their functional and phonological analysis (sections 3 and 4). Section 5 gives practical advice for documenting and analyzing prosodic features in the field.National Foreign Language Resource Cente
When we fail to question in Japanese
When we pay close attention to the prosody of Wh-questions in Japanese, we discover many novel and interesting empirical puzzles that would require us to devise a much finer syntactic component of grammar. This paper addresses the issues that pose some problems to such an elaborated grammar, and offers solutions, making an appeal to the information structure and sentence processing involved in the interpretation of interrogative and focus constructions
The restricted access of information structure to syntax : a minority report
This paper sketches the view that syntax does not directly interact with information structure. Therefore, syntactic data are of little help when one wants to narrow down the interpretation of terms such as “focus”, “topic”, etc
Language identification with suprasegmental cues: A study based on speech resynthesis
This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered
- …