3,462 research outputs found

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    Detecting Frames and Causal Relationships in Climate Change Related Text Databases Based on Semantic Features

    Get PDF
    abstract: The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface variations in text when different keywords are used for similar concepts. This thesis develops a unique type of textual features that generalize triplets extracted from text, by clustering them into high-level concepts. These concepts are utilized as features to detect frames in text. Compared to uni-gram and bi-gram based models, classification and clustering using generalized concepts yield better discriminating features and a higher classification accuracy with a 12% boost (i.e. from 74% to 83% F-measure) and 0.91 clustering purity for Frame/Non-Frame detection. The automatic discovery of complex causal chains among interlinked events and their participating actors has not yet been thoroughly studied. Previous studies related to extracting causal relationships from text were based on laborious and incomplete hand-developed lists of explicit causal verbs, such as “causes" and “results in." Such approaches result in limited recall because standard causal verbs may not generalize well to accommodate surface variations in texts when different keywords and phrases are used to express similar causal effects. Therefore, I present a system that utilizes generalized concepts to extract causal relationships. The proposed algorithms overcome surface variations in written expressions of causal relationships and discover the domino effects between climate events and human security. This semi-supervised approach alleviates the need for labor intensive keyword list development and annotated datasets. Experimental evaluations by domain experts achieve an average precision of 82%. Qualitative assessments of causal chains show that results are consistent with the 2014 IPCC report illuminating causal mechanisms underlying the linkages between climatic stresses and social instability.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Extracting Temporal and Causal Relations between Events

    Full text link
    Structured information resulting from temporal information processing is crucial for a variety of natural language processing tasks, for instance to generate timeline summarization of events from news documents, or to answer temporal/causal-related questions about some events. In this thesis we present a framework for an integrated temporal and causal relation extraction system. We first develop a robust extraction component for each type of relations, i.e. temporal order and causality. We then combine the two extraction components into an integrated relation extraction system, CATENA---CAusal and Temporal relation Extraction from NAtural language texts---, by utilizing the presumption about event precedence in causality, that causing events must happened BEFORE resulting events. Several resources and techniques to improve our relation extraction systems are also discussed, including word embeddings and training data expansion. Finally, we report our adaptation efforts of temporal information processing for languages other than English, namely Italian and Indonesian.Comment: PhD Thesi

    Causal graph extraction from news: a comparative study of time-series causality learning techniques

    Get PDF
    Causal graph extraction from news has the potential to aid in the understanding of complex scenarios. In particular, it can help explain and predict events, as well as conjecture about possible cause-effect connections. However, limited work has addressed the problem of large-scale extraction of causal graphs from news articles. This article presents a novel framework for extracting causal graphs from digital text media. The framework relies on topic-relevant variables representing terms and ongoing events that are selected from a domain under analysis by applying specially developed information retrieval and natural language processing methods. Events are represented as event-phrase embeddings, which make it possible to group similar events into semantically cohesive clusters. A time series of the selected variables is given as input to a causal structure learning techniques to learn a causal graph associated with the topic that is being examined. The complete framework is applied to the New York Times dataset, which covers news for a period of 246 months (roughly 20 years), and is illustrated through a case study. An initial evaluation based on synthetic data is carried out to gain insight into the most effective time-series causality learning techniques. This evaluation comprises a systematic analysis of nine state-of-the-art causal structure learning techniques and two novel ensemble methods derived from the most effective techniques. Subsequently, the complete framework based on the most promising causal structure learning technique is evaluated with domain experts in a real-world scenario through the use of the presented case study. The proposed analysis offers valuable insights into the problems of identifying topic-relevant variables from large volumes of news and learning causal graphs from time seriesFil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Dalhousie University Halifax; CanadáFil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; ArgentinaFil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; ArgentinaFil: Milios, Evangelos. Dalhousie University Halifax; CanadáFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentin

    Leveraging graph-based semantic annotation for the identification of cause-effect relations

    Get PDF
    This research is related to language article in Indonesia that discuss about causality relationship research used as public health surveillance information monitoring system. Utilization of this research is suitability of feature selection, phrase annotation, paragraph annotation, medical element annotation and graph-based semantic annotation. Evaluation of system performance is done by intrinsic approach using the Naive Bayes Multinomial method. The results obtained sequentially for recall, precision and f-measure are 0.924, 0.905, and 0.910

    The biomedical discourse relation bank

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.</p> <p>Results</p> <p>We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).</p> <p>Conclusion</p> <p>Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.</p
    • …
    corecore