1,021 research outputs found

    Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

    Full text link
    In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

    Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping

    Get PDF
    Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s

    Punctuation in Quoted Speech

    Full text link
    Quoted speech is often set off by punctuation marks, in particular quotation marks. Thus, it might seem that the quotation marks would be extremely useful in identifying these structures in texts. Unfortunately, the situation is not quite so clear. In this work, I will argue that quotation marks are not adequate for either identifying or constraining the syntax of quoted speech. More useful information comes from the presence of a quoting verb, which is either a verb of saying or a punctual verb, and the presence of other punctuation marks, usually commas. Using a lexicalized grammar, we can license most quoting clauses as text adjuncts. A distinction will be made not between direct and indirect quoted speech, but rather between adjunct and non-adjunct quoting clauses.Comment: 11 pages, 11 ps figures, Proceedings of SIGPARSE 96 - Punctuation in Computational Linguistic

    Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective

    Get PDF
    Punctuation helps us to structure, and thus to understand, texts. Many uses of punctuation straddle the line between syntax and discourse, because they serve to combine multiple propositions within a single orthographic sentence. They allow us to insert discourse-level relations at the level of a single sentence. Just as people make use of information from punctuation in processing what they read, computers can use information from punctuation in processing texts automatically. Most current natural language processing systems fail to take punctuation into account at all, losing a valuable source of information about the text. Those which do mostly do so in a superficial way, again failing to fully exploit the information conveyed by punctuation. To be able to make use of such information in a computational system, we must first characterize its uses and find a suitable representation for encoding them. The work here focuses on extending a syntactic grammar to handle phenomena occurring within a single sentence which have punctuation as an integral component. Punctuation marks are treated as full-fledged lexical items in a Lexicalized Tree Adjoining Grammar, which is an extremely well-suited formalism for encoding punctuation in the sentence grammar. Each mark anchors its own elementary trees and imposes constraints on the surrounding lexical items. I have analyzed data representing a wide variety of constructions, and added treatments of them to the large English grammar which is part of the XTAG system. The advantages of using LTAG are that its elementary units are structured trees of a suitable size for stating the constraints we are interested in, and the derivation histories it produces contain information the discourse grammar will need about which elementary units have used and how they have been combined. I also consider in detail a few particularly interesting constructions where the sentence and discourse grammars meet-appositives, reported speech and uses of parentheses. My results confirm that punctuation can be used in analyzing sentences to increase the coverage of the grammar, reduce the ambiguity of certain word sequences and facilitate discourse-level processing of the texts

    A New Paradigm for Punctuation

    Get PDF
    This is a comprehensive study of punctuation, particularly the uses to which it has been put as writing developed over the centuries and as it gradually evolved from an aid to oral delivery to its use in texts that were read silently. The sudden need for standardization of punctuation which occurred with the start of printing spawned some small amount of interest in determining its purpose, but most works after printing began were devoted mainly to helping people use punctuation rather than try to discover why it was being used. Gradually, two main views on its purpose developed: it was being used for rhetorical purposes or it was needed to reveal the grammar in writing. These views are still somewhat in place. The community of linguists took little notice of writing until the last few centuries and even less notice of punctuation. The result was that few studies were done on the underlying purpose for punctuation until the twentieth century, and even those were few and far between, most of them occurring only in the last thirty years. This study argues that neither rhetoric nor grammar is directly the basis for punctuation. Rather, it responds to a schema that determines the order of the words in spoken and written English, and it is a linguistic concept without question. The special uses of the features of punctuation are discussed, as well as some anomalies in its use, some ideas for more studies, and some ideas for improving the teaching of punctuation

    Correlation between phonetic factors and linguistic events regarding a prosodic pattern of European Portuguese: a practical proposal

    Get PDF
    In this article a prosodic model for European Portuguese (henceforth EP) based on a linguistic approach is described. It was developed in the scope of the Antigona Project, an electronic-commerce system using a speech interface (Speech to Text plus Text To Speech, the latter based on a time concatenation technique) for EP language. The purpose of our work is to contribute with practical strategies in order to improve synthetic speech quality and naturalness, concerning prosodic processing. It is also our goal to show that syntactic structures strongly determine prosody patterns in EP. It is also important to emphasize the pragmatic commercial objective of this system, which is selling a product. Therefore, this type of application deals with a specific vocabulary choice, it is displayed in predictable syntactic constructions and sentences, making prosodic contours and focus become expected. This study was held in intimate articulation between the engineering experience and tools and the linguistic approach. We believe that this work represents an important achievement for future research on synthetic speech processing in particular for EP. Moreover, it can be applied to other Romanic languages, regarding their syntactic resemblances

    The Interplay Of Syntactic Parsing Strategies And Prosodic Phrase Lengths In Processing Turkish Sentences

    Full text link
    Many experiments have shown that the prosody (rhythm and melody) with which a sentence is uttered can provide a listener with cues to its syntactic structure (Lehiste, 1973, and since). A few studies have observed in addition that an inappropriate prosodic contour can mislead the syntactic parsing routines, resulting in a prosody-induced garden-path. These include, among others, Speer et al. (1996) and Kjelgaard and Speer (1999) for English. The studies by Speer et al. and Kjelgaard and Speer (SKS) showed that misplaced prosodic cues caused more processing difficulty in sentences with early closure of a clause (EC syntax) than in ones with late closure of a clause (LC syntax). One possible explanation for these results is that when prosody is misleading about the syntactic structure, the parser may ignore it and resort to a syntactic Late Closure strategy, as it does in reading where there is no overt prosodic boundary to inform the parser about the syntactic structure of the sentence. Augurzky\u27s (2006) observation of an LC syntax advantage for prosody-syntax mismatch conditions in her investigation of German relative clause attachment ambiguities provides support for this explanation. An alternative explanation considers the possibility that constituent lengths could have influenced the perceived informativeness of overt prosodic cues in these studies, as proposed in the Rational Speaker Hypothesis of Clifton et al. (2002, 2006). The Rational Speaker Hypothesis (RSH) maintains that prosodic breaks flanking shorter constituents are taken more seriously as indicators of syntactic structure than prosodic breaks flanking longer constituents, because the former cannot be justified as motivated by optimal length considerations. To test these two alternative hypotheses, four listening experiments were conducted. There was an additional reading experiment preceding the listening experiments to explore potential effects of the Late Closure strategy and constituent lengths in reading where there is no overt prosody. In all cases the target materials were temporarily ambiguous Turkish sentences which could be morphologically resolved as either LC or EC syntactic constructions. Constituent lengths were systematically manipulated in all target materials, such that the length-optimal prosodic phrasing was associated with LC syntax in one condition, and with EC syntax in the other. Experiment 1 employed a missing morpheme task developed for this study. In the missing morpheme task, underscores (length-averaged) replaced the disambiguating morphemes and participants had to insert them as they read the sentences aloud. Results revealed significant effects of phrase lengths in readers\u27 syntactic interpretations as indicated by the morphemes they inserted and the prosodic breaks they produced. Experiments 2A and 2B employed an end-of-sentence `got it\u27 task (Frazier et al., 1983), in which participants listened to spoken sentences and indicated after each one whether they understood or did not understand it. Sentences in Experiment 2A had phrase length distribution similar to the SKS English materials. Experiment 2B manipulated lengths in reverse. The stimuli had cooperating, conflicting or neutral prosody. Response time data supported an interplay of both syntactic Late Closure and RSH. Thus it was concluded that constituent lengths can indeed have a significant effect on listeners\u27 parsing decisions, in addition to the familiar syntactic parsing biases and prosodic influences. Experiments 3A and 3B used a lexical probe version of the phoneme restoration paradigm employed by Stoyneshka et al. (2010). In the phoneme restoration paradigm, the disambiguating phonemes (in the verb, in these materials) are replaced with noise (in this study, pink noise). In the lexical probe version of this paradigm (developed for this study) participants listened to the sentences with LC, EC or neutral prosody, and at the end of the sentence they were presented with a visual probe (one of the two possible disambiguating verbs, complete with all phonemes) that was congruent or incongruent or compatible with the prosody of the sentence they had heard. Their task was to respond to the visual probe either `yes\u27 (i.e., `I heard this word in the sentence I have just listened to\u27) or `no\u27 (i.e., `I didn\u27t hear this word\u27). Response time to the probe word indirectly taps which of the disambiguating morphemes on the verb the listener mentally supplies when it has been replaced by noise. The materials for Experiments 3A and 3B were identical to those used in Experiments 2A and 2B respectively except that the disambiguating phonemes were noise-replaced. Results of Experiments 3A and 3B showed that listeners were highly sensitive to the sentential prosody as revealed by their phoneme restoration responses and response time data, confirming Stoyneshka et al.\u27s findings establishing the reliability of the phoneme restoration paradigm in investigating effects of prosody in ambiguity resolution. Response time data showed a pattern similar to what SKS observed for English (except for one condition in Experiment 3A, with incongruent probes): despite the phrase length reversal in Experiment 3B, there was no influence of phrase length distribution on ambiguity resolution. This has a natural explanation in light of the difference between the `got it\u27 task with disambiguating morphology within the sentence stimulus, and the phoneme restoration task in which the listener can project onto the verb whatever morphology is compatible with the heard prosody. LC and EC were processed equally well for congruent probes, and there was an LC advantage in the incongruent and compatible probe conditions. Overall results support the hypothesis that syntactic Late Closure becomes evident in listening when prosody is absent or misleading, and also that phrase lengths can play a significant role

    Computational Approaches to the Syntax–Prosody Interface: Using Prosody to Improve Parsing

    Full text link
    Prosody has strong ties with syntax, since prosody can be used to resolve some syntactic ambiguities. Syntactic ambiguities have been shown to negatively impact automatic syntactic parsing, hence there is reason to believe that prosodic information can help improve parsing. This dissertation considers a number of approaches that aim to computationally examine the relationship between prosody and syntax of natural languages, while also addressing the role of syntactic phrase length, with the ultimate goal of using prosody to improve parsing. Chapter 2 examines the effect of syntactic phrase length on prosody in double center embedded sentences in French. Data collected in a previous study were reanalyzed using native speaker judgment and automatic methods (forced alignment). Results demonstrate similar prosodic splitting behavior as in English in contradiction to the original study’s findings. Chapter 3 presents a number of studies examining whether syntactic ambiguity can yield different prosodic patterns, allowing humans and/or computers to resolve the ambiguity. In an experimental study, humans disambiguated sentences with prepositional phrase- (PP)-attachment ambiguity with 49% accuracy presented as text, and 63% presented as audio. Machine learning on the same data yielded an accuracy of 63-73%. A corpus study on the Switchboard corpus used both prosodic breaks and phrase lengths to predict the attachment, with an accuracy of 63.5% for PP-attachment sentences, and 71.2% for relative clause attachment. Chapter 4 aims to identify aspects of syntax that relate to prosody and use these in combination with prosodic cues to improve parsing. The aspects identified (dependency configurations) are based on dependency structure, reflecting the relative head location of two consecutive words, and are used as syntactic features in an ensemble system based on Recurrent Neural Networks, to score parse hypotheses and select the most likely parse for a given sentence. Using syntactic features alone, the system achieved an improvement of 1.1% absolute in Unlabelled Attachment Score (UAS) on the test set, above the best parser in the ensemble, while using syntactic features combined with prosodic features (pauses and normalized duration) led to a further improvement of 0.4% absolute. The results achieved demonstrate the relationship between syntax, syntactic phrase length, and prosody, and indicate the ability and future potential of prosody to resolve ambiguity and improve parsing

    Inner voice experiences during processing of direct and indirect speech

    Get PDF
    In this chapter, we review a new body of research on language processing, focusing particularly on the distinction between direct speech (e.g., Mary said, “This dress is absolutely beautiful!”) and indirect speech (e.g., Mary said that the dress was absolutely beautiful). First, we will discuss an important pragmatic distinction between the two reporting styles and highlight the consequences of this distinction for prosodic processing. While direct speech provides vivid demonstrations of the reported speech act (informing recipients about how something was said by another speaker), indirect speech is more descriptive of what was said by the reported speaker. This is clearly reflected in differential prosodic contours for the two reporting styles during speaking: Direct speech is typically delivered with a more variable and expressive prosody, whereas indirect speech tends to be used in combination with a more neutral and less expressive prosody. Next, we will introduce recent evidence in support of an “inner voice” during language comprehension, especially during silent reading of direct speech quotations. We present and discuss a coherent stream of research using a wide range of methods, including speech analysis, functional magnetic resonance imaging (fMRI), and eye-tracking. The findings are discussed in relation to overt (or ‘explicit’) prosodic characteristics that are likely to be observed when direct and indirect speech are used in spoken utterances (such as during oral reading). Indeed, the research we review here makes a convincing case for the hypothesis that recipients spontaneously activate voice-related mental representations during silent reading, and that such an “inner voice” is particularly pronounced when reading direct speech quotations (and much less so for indirect speech). The corresponding brain activation patterns, as well as correlations between silent and oral reading data, furthermore suggest that this “inner voice” during silent reading is related to the supra-segmental and temporal characteristics of actual speech. For ease of comparison, we shall dub this phenomenon of an “inner voice” (particularly during silent reading of direct speech) simulated implicit prosody to distinguish it from default implicit prosody that is commonly discussed in relation to syntactic ambiguity resolution. In the final part of this chapter, we will attempt to specify the relation between simulated and default implicit prosody. Based on the existing empirical data and our own theoretical conclusions, we will discuss the similarities and discrepancies between the two not necessarily mutually exclusive terms. We hope that our discussion will motivate a new surge of interdisciplinary research that will not only extend our knowledge of prosodic processes during reading, but could potentially unify the two phenomena in a single theoretical framework
    corecore