21 research outputs found

    Anaphora and Discourse Structure

    Full text link
    We argue in this paper that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure, instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure. This allows a simpler discourse structure to provide scaffolding for compositional semantics, and reveals multiple ways in which the relational meaning conveyed by adverbial connectives can interact with that associated with discourse structure. We conclude by sketching out a lexicalised grammar for discourse that facilitates discourse interpretation as a product of compositional rules, anaphor resolution and inference.Comment: 45 pages, 17 figures. Revised resubmission to Computational Linguistic

    A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation

    Get PDF
    We present a corpus of anaphoric information (coreference) crowdsourced through a game-with-a-purpose. The corpus, containing annotations for about 108,000 markables, is one of the largest corpora for coreference for English, and one of the largest crowdsourced NLP corpora, but its main feature is the large number of judgments per markable: 20 on average, and over 2.2M in total. This characteristic makes the corpus a unique resource for the study of disagreements on anaphoric interpretation. A second distinctive feature is its rich annotation scheme, covering singletons, expletives, and split-antecedent plurals. Finally, the corpus also comes with labels inferred using a recently proposed probabilistic model of annotation for coreference. The labels are of high quality and make it possible to successfully train a state of the art coreference resolver, including training on singletons and non-referring expressions. The annotation model can also result in more than one label, or no label, being proposed for a markable, thus serving as a baseline method for automatically identifying ambiguous markables. A preliminary analysis of the results is presented

    Accounting for Discourse Relations: Constituency and Dependency

    Get PDF
    At the start of my career, I had the good fortune of working wit

    Discourse Structure and Computation: Past, Present and Future

    Get PDF
    The discourse properties of text have long been recognized as critical to language technology, and over the past 40 years, our understanding of and ability to exploit the discourse properties of text has grown in many ways. This essay briefly recounts these developments, the technology they employ, the applications they support, and the new challenges that each subsequent development has raised. We conclude with the challenges faced by our current understanding of discourse, and the applications that meeting these challenges will promote. 1 Why bother with discourse

    Linguistics parameters for zero anaphora resolution

    Get PDF
    Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

    L2 Influence on L1 : Chinese subject realisation in Chinese-English bilinguals

    Get PDF
    This study aims to investigate the influence of the second language (L2) on the use of the first language (L1) in late bilinguals within an L1 dominant environment. Cross-linguistic influence (Kellerman & Smith, 1986) has been usually studied in the forward direction: how bilinguals’ L1 influences the acquisition and use of their L2. The other direction (i.e., the influence of L2 on L1), on the other hand, has not been sufficiently investigated. The current study looks at Chinese-speaking learners who acquire their L2 English through instruction in an L1 dominant environment. It does so by examining ‘subject realisation’, an area where Chinese and English exhibit substantial typological contrasts since Chinese allows both overt and null arguments under certain discourse-pragmatic conditions, whereas subjects in English are, under most circumstances, obligatorily expressed (Huang, 1984).. It is then hypothesized that long-time learning and regularly using English as L2 would increase the use of overt subjects realised in the bilingual’s first language, i.e., Chinese, with the consequent use of fewer null subjects in their L1. In addition, following Grosjean (1998), the interaction between the bilingual’s two languages is expected to be stronger when bilinguals produce language in the so called ‘bilingual mode’, i.e., when both languages are highly activated, than in a ‘monolingual mode’, i.e., when only one language is predominately activated. Such ‘language mode’ factor leads naturally to a futher hypothesis: fewer null subjects are realised in speech produced by Chinese-English bilinguals within a bilingual mode compared to monolingual mode

    Towards a Typology of Narrative Frustration

    Get PDF
    Through imaginative engagement readers of fiction become, to an extraordinary extent, the narrator’s ‘children’: they often submit themselves to the narrator’s authority without reserve. But precisely because of that, readers are deeply at a loss when their trust is betrayed. This underscores a core function of fiction, namely to evoke emotional response in the reader. In this paper, we hypothesize how a reader’s imaginative engagement can be subjected to narrative frustration due to processing or moral complexity. The types of narrative frustration we consider differ in terms of their sources, and their emotional and behavioral impacts on the reader. Here, we break down these frustrations into their component parts, in an effort to better characterize the different classes of frustrations. We propose that frustrations arise from different combinations of local uncertainty, moral clash and global uncertainty. These sources of frustration in turn explain the reader’s emotional response and their consequent reading behavior as they imaginatively engage with fiction

    Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research

    Get PDF
    Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain
    corecore