703 research outputs found

    PADS Restoration and Its Importance in Reading Comprehension and Meaning Representation

    Get PDF

    Punctuation Restoration Improves Structure Understanding without Supervision

    Full text link
    Unsupervised learning objectives like language modeling and de-noising constitute a significant part in producing pre-trained models that perform various downstream applications from natural language understanding to conversational tasks. However, despite impressive generative capabilities of recent large language models, their abilities to capture syntactic or semantic structure within text lag behind. We hypothesize that the mismatch between linguistic performance and competence in machines is attributable to insufficient transfer of linguistic structure knowledge to computational systems with currently popular pre-training objectives. We show that punctuation restoration as a learning objective improves in- and out-of-distribution performance on structure-related tasks like named entity recognition, open information extraction, chunking, and part-of-speech tagging. Punctuation restoration is an effective learning objective that can improve structure understanding and yield a more robust structure-aware representations of natural language.Comment: 10 pages, 1 figure, 6 table

    Speakerly: A Voice-based Writing Assistant for Text Composition

    Full text link
    We present Speakerly, a new real-time voice-based writing assistance system that helps users with text composition across various use cases such as emails, instant messages, and notes. The user can interact with the system through instructions or dictation, and the system generates a well-formatted and coherent document. We describe the system architecture and detail how we address the various challenges while building and deploying such a system at scale. More specifically, our system uses a combination of small, task-specific models as well as pre-trained language models for fast and effective text composition while supporting a variety of input modes for better usability.Comment: Accepted at EMNLP 2023 Industry Trac

    Finding structure in language

    Get PDF
    Since the Chomskian revolution, it has become apparent that natural language is richly structured, being naturally represented hierarchically, and requiring complex context sensitive rules to define regularities over these representations. It is widely assumed that the richness of the posited structure has strong nativist implications for mechanisms which might learn natural language, since it seemed unlikely that such structures could be derived directly from the observation of linguistic data (Chomsky 1965).This thesis investigates the hypothesis that simple statistics of a large, noisy, unlabelled corpus of natural language can be exploited to discover some of the structure which exists in natural language automatically. The strategy is to initially assume no knowledge of the structures present in natural language, save that they might be found by analysing statistical regularities which pertain between a word and the words which typically surround it in the corpus.To achieve this, various statistical methods are applied to define similarity between statistical distributions, and to infer a structure for a domain given knowledge of the similarities which pertain within it. Using these tools, it is shown that it is possible to form a hierarchical classification of many domains, including words in natural language. When this is done, it is shown that all the major syntactic categories can be obtained, and the classification is both relatively complete, and very much in accord with a standard linguistic conception of how words are classified in natural language.Once this has been done, the categorisation derived is used as the basis of a similar classification of short sequences of words. If these are analysed in a similar way, then several syntactic categories can be derived. These include simple noun phrases, various tensed forms of verbs, and simple prepositional phrases. Once this has been done, the same technique can be applied one level higher, and at this level simple sentences and verb phrases, as well as more complicated noun phrases and prepositional phrases, are shown to be derivable

    The Interplay Of Syntactic Parsing Strategies And Prosodic Phrase Lengths In Processing Turkish Sentences

    Full text link
    Many experiments have shown that the prosody (rhythm and melody) with which a sentence is uttered can provide a listener with cues to its syntactic structure (Lehiste, 1973, and since). A few studies have observed in addition that an inappropriate prosodic contour can mislead the syntactic parsing routines, resulting in a prosody-induced garden-path. These include, among others, Speer et al. (1996) and Kjelgaard and Speer (1999) for English. The studies by Speer et al. and Kjelgaard and Speer (SKS) showed that misplaced prosodic cues caused more processing difficulty in sentences with early closure of a clause (EC syntax) than in ones with late closure of a clause (LC syntax). One possible explanation for these results is that when prosody is misleading about the syntactic structure, the parser may ignore it and resort to a syntactic Late Closure strategy, as it does in reading where there is no overt prosodic boundary to inform the parser about the syntactic structure of the sentence. Augurzky\u27s (2006) observation of an LC syntax advantage for prosody-syntax mismatch conditions in her investigation of German relative clause attachment ambiguities provides support for this explanation. An alternative explanation considers the possibility that constituent lengths could have influenced the perceived informativeness of overt prosodic cues in these studies, as proposed in the Rational Speaker Hypothesis of Clifton et al. (2002, 2006). The Rational Speaker Hypothesis (RSH) maintains that prosodic breaks flanking shorter constituents are taken more seriously as indicators of syntactic structure than prosodic breaks flanking longer constituents, because the former cannot be justified as motivated by optimal length considerations. To test these two alternative hypotheses, four listening experiments were conducted. There was an additional reading experiment preceding the listening experiments to explore potential effects of the Late Closure strategy and constituent lengths in reading where there is no overt prosody. In all cases the target materials were temporarily ambiguous Turkish sentences which could be morphologically resolved as either LC or EC syntactic constructions. Constituent lengths were systematically manipulated in all target materials, such that the length-optimal prosodic phrasing was associated with LC syntax in one condition, and with EC syntax in the other. Experiment 1 employed a missing morpheme task developed for this study. In the missing morpheme task, underscores (length-averaged) replaced the disambiguating morphemes and participants had to insert them as they read the sentences aloud. Results revealed significant effects of phrase lengths in readers\u27 syntactic interpretations as indicated by the morphemes they inserted and the prosodic breaks they produced. Experiments 2A and 2B employed an end-of-sentence `got it\u27 task (Frazier et al., 1983), in which participants listened to spoken sentences and indicated after each one whether they understood or did not understand it. Sentences in Experiment 2A had phrase length distribution similar to the SKS English materials. Experiment 2B manipulated lengths in reverse. The stimuli had cooperating, conflicting or neutral prosody. Response time data supported an interplay of both syntactic Late Closure and RSH. Thus it was concluded that constituent lengths can indeed have a significant effect on listeners\u27 parsing decisions, in addition to the familiar syntactic parsing biases and prosodic influences. Experiments 3A and 3B used a lexical probe version of the phoneme restoration paradigm employed by Stoyneshka et al. (2010). In the phoneme restoration paradigm, the disambiguating phonemes (in the verb, in these materials) are replaced with noise (in this study, pink noise). In the lexical probe version of this paradigm (developed for this study) participants listened to the sentences with LC, EC or neutral prosody, and at the end of the sentence they were presented with a visual probe (one of the two possible disambiguating verbs, complete with all phonemes) that was congruent or incongruent or compatible with the prosody of the sentence they had heard. Their task was to respond to the visual probe either `yes\u27 (i.e., `I heard this word in the sentence I have just listened to\u27) or `no\u27 (i.e., `I didn\u27t hear this word\u27). Response time to the probe word indirectly taps which of the disambiguating morphemes on the verb the listener mentally supplies when it has been replaced by noise. The materials for Experiments 3A and 3B were identical to those used in Experiments 2A and 2B respectively except that the disambiguating phonemes were noise-replaced. Results of Experiments 3A and 3B showed that listeners were highly sensitive to the sentential prosody as revealed by their phoneme restoration responses and response time data, confirming Stoyneshka et al.\u27s findings establishing the reliability of the phoneme restoration paradigm in investigating effects of prosody in ambiguity resolution. Response time data showed a pattern similar to what SKS observed for English (except for one condition in Experiment 3A, with incongruent probes): despite the phrase length reversal in Experiment 3B, there was no influence of phrase length distribution on ambiguity resolution. This has a natural explanation in light of the difference between the `got it\u27 task with disambiguating morphology within the sentence stimulus, and the phoneme restoration task in which the listener can project onto the verb whatever morphology is compatible with the heard prosody. LC and EC were processed equally well for congruent probes, and there was an LC advantage in the incongruent and compatible probe conditions. Overall results support the hypothesis that syntactic Late Closure becomes evident in listening when prosody is absent or misleading, and also that phrase lengths can play a significant role

    Problems in Evaluating Grammatical Error Detection Systems

    Get PDF
    ABSTRACT Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer's sentence, the annotator's correction, and the system's output makes evaluation more complex than in some other NLP tasks, which we address by presenting an intuitive evaluation scheme. Of particular importance to error detection is the skew of the data -the low frequency of errors as compared to non-errors -which distorts some traditional measures of performance and limits their usefulness, leading us to recommend the reporting of raw measurements (true positives, false negatives, false positives, true negatives). Other issues that are particularly vexing for error detection focus on defining these raw measurements: specifying the size or scope of an error, properly treating errors as graded rather than discrete phenomena, and counting non-errors. We discuss recommendations for best practices with regard to reporting the results of system evaluation for these cases, recommendations which depend upon making clear one's assumptions and applications for error detection. By highlighting the problems with current error detection evaluation, the field will be better able to move forward

    Computational Approaches to the Syntax–Prosody Interface: Using Prosody to Improve Parsing

    Full text link
    Prosody has strong ties with syntax, since prosody can be used to resolve some syntactic ambiguities. Syntactic ambiguities have been shown to negatively impact automatic syntactic parsing, hence there is reason to believe that prosodic information can help improve parsing. This dissertation considers a number of approaches that aim to computationally examine the relationship between prosody and syntax of natural languages, while also addressing the role of syntactic phrase length, with the ultimate goal of using prosody to improve parsing. Chapter 2 examines the effect of syntactic phrase length on prosody in double center embedded sentences in French. Data collected in a previous study were reanalyzed using native speaker judgment and automatic methods (forced alignment). Results demonstrate similar prosodic splitting behavior as in English in contradiction to the original study’s findings. Chapter 3 presents a number of studies examining whether syntactic ambiguity can yield different prosodic patterns, allowing humans and/or computers to resolve the ambiguity. In an experimental study, humans disambiguated sentences with prepositional phrase- (PP)-attachment ambiguity with 49% accuracy presented as text, and 63% presented as audio. Machine learning on the same data yielded an accuracy of 63-73%. A corpus study on the Switchboard corpus used both prosodic breaks and phrase lengths to predict the attachment, with an accuracy of 63.5% for PP-attachment sentences, and 71.2% for relative clause attachment. Chapter 4 aims to identify aspects of syntax that relate to prosody and use these in combination with prosodic cues to improve parsing. The aspects identified (dependency configurations) are based on dependency structure, reflecting the relative head location of two consecutive words, and are used as syntactic features in an ensemble system based on Recurrent Neural Networks, to score parse hypotheses and select the most likely parse for a given sentence. Using syntactic features alone, the system achieved an improvement of 1.1% absolute in Unlabelled Attachment Score (UAS) on the test set, above the best parser in the ensemble, while using syntactic features combined with prosodic features (pauses and normalized duration) led to a further improvement of 0.4% absolute. The results achieved demonstrate the relationship between syntax, syntactic phrase length, and prosody, and indicate the ability and future potential of prosody to resolve ambiguity and improve parsing

    The Punctator\u27s World: A Discursion (Part Five)

    Get PDF
    This, the fifth in a series on the history and ambitions of punctuation, describes the first vigorous manifestation of logical pointing. In an enlightened atmosphere of book reading and language consciousness, it was discerned that the shapes of sentences and their working parts were better delineated when punctuated syntactically

    ENHANCING EFFECTIVENESS OF DEMOCRATIC REPRESENTATION. CONSTITUENCIES AND EQUALITY OF THE VOTE WITHIN DIFFERENT ELECTORAL SYSTEMS AND FORMS OF GOVERNMENT

    Get PDF
    This special issue of ConsultaOnline delves into the theme of enhancing the effectiveness of democratic representation, districts, and equality of the vote within different electoral systems and forms of government. Under the editorial guidance of Lorenzo Spadacini, the issue features diverse perspectives on electoral constituency delineation, tackling gerrymandering and minority representation, exploring proportional representation and multi-member districts, and delving into the realm of direct democracy. With contributions from scholars and experts in the field, the issue offers a comprehensive examination of key issues and potential reforms in electoral processes in various countries such as Italy, the United States, Canada, and Germany
    • …
    corecore