12 research outputs found

    Cross-lingual RST Discourse Parsing

    Get PDF
    Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page

    Adaptation of discourse parsing models for the portuguese language

    Get PDF
    Discourse parsing in Portuguese has two critical limitations. The first is that the task has been explored using only symbolic approaches, i.e., using manually extracted lexical patterns. The second is related to the domain of the lexical patterns, which were extracted through the analysis of a corpus of academic texts, generating many domain-specific patterns. For English, many approaches have been explored using machine learning with features based on a prominent lexicon-syntax notion of dominance sets. In this paper, two works were adapted to Portuguese, improving the results, outperforming the baselines and previous works for Portuguese, considering the task of rhetorical relation identification.SĂŁo Paulo Research Foundation (FAPESP) (grant 2014/11632-0)Natural Sciences and Engineering Research Council of CanadaUniversity of Toront

    Improving discourse structure identification

    Get PDF
    Rhetorical Structure Theory (Mann et al. 1988), a popular approach for analyzing discourse coherence, suggests that coherent text can be placed into a hierarchical organization of clauses. Identification of a text’s rhetorical structure through automatic discourse analysis is a crucial element for many of today’s Natural Language Processing tasks, but no sufficient tool is available. The current state-of -the-art discourse parser, SPADE (Soricut et al. 2003), is limited to parsing discourse within a single sentence. HILDA (Hernault et al. 2010) extends the parsing abilities of SPADE to the document level, but with a decrease in performance. This study achieved document-level discourse parsing without sacrificing performance. Provided text was already segmented into elementary discourse units, the task of discourse parsing was separated into three steps: structuring, nuclearity labeling, and relation labeling. An algorithm was developed for classifying relation existence, nuclearity, and relation label that improved upon previous methods. New features were explored for all three steps to maintain state-of-the-art performance when parsing at the document-level

    Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research

    Get PDF
    Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain

    Social talk capabilities for dialogue systems

    Get PDF
    Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or social talk) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called out-of-domain (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads

    Social talk capabilities for dialogue systems

    Get PDF
    Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or “social talk”) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called “out-of-domain” (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads
    corecore