17,317 research outputs found

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

    Full text link
    In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

    The emergence of prosody in linguistic theory

    Get PDF
    Prosody is a unique character in the production of sounds. Human speech is particularly marked by prosody for various functions in the different aspects of linguistics (e.g. phonology, morphology, sociolinguistics). The importance of prosody in human language had been known since very early periods of modern civilisation. Both Western and Eastern traditions had put a lot of emphasis on the proper practice of prosodic rhymes and rhythms in the use of language whether it was for analysing grammar or for praying to God or any other superior spirit. Subsequent developments in linguistics have revealed the central role played by prosody in determining the innate grammar of human language. This paper attempts to discuss in brief the evolution of the thought on prosody and its current standing in the field of linguistics.peer-reviewe

    Prosodic description: An introduction for fieldworkers

    Get PDF
    This article provides an introductory tutorial on prosodic features such as tone and accent for researchers working on little-known languages. It specifically addresses the needs of non-specialists and thus does not presuppose knowledge of the phonetics and phonology of prosodic features. Instead, it intends to introduce the uninitiated reader to a field often shied away from because of its (in part real, but in part also just imagined) complexities. It consists of a concise overview of the basic phonetic phenomena (section 2) and the major categories and problems of their functional and phonological analysis (sections 3 and 4). Section 5 gives practical advice for documenting and analyzing prosodic features in the field.National Foreign Language Resource Cente

    When we fail to question in Japanese

    Get PDF
    When we pay close attention to the prosody of Wh-questions in Japanese, we discover many novel and interesting empirical puzzles that would require us to devise a much finer syntactic component of grammar. This paper addresses the issues that pose some problems to such an elaborated grammar, and offers solutions, making an appeal to the information structure and sentence processing involved in the interpretation of interrogative and focus constructions

    The restricted access of information structure to syntax : a minority report

    Get PDF
    This paper sketches the view that syntax does not directly interact with information structure. Therefore, syntactic data are of little help when one wants to narrow down the interpretation of terms such as “focus”, “topic”, etc

    Language identification with suprasegmental cues: A study based on speech resynthesis

    Get PDF
    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered
    • …
    corecore