53,132 research outputs found

    Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing

    Full text link
    Drilling activities in the oil and gas industry have been reported over decades for thousands of wells on a daily basis, yet the analysis of this text at large-scale for information retrieval, sequence mining, and pattern analysis is very challenging. Drilling reports contain interpretations written by drillers from noting measurements in downhole sensors and surface equipment, and can be used for operation optimization and accident mitigation. In this initial work, a methodology is proposed for automatic classification of sentences written in drilling reports into three relevant labels (EVENT, SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main challenges in the text corpus were overcome, which include the high frequency of technical symbols, mistyping/abbreviation of technical terms, and the presence of incomplete sentences in the drilling reports. We obtain state-of-the-art classification accuracy within this technical language and illustrate advanced queries enabled by the tool.Comment: 7 pages, 14 figures, technical repor

    Feature extraction and classification of spam emails

    Get PDF

    Rude waiter but mouthwatering pastries! An exploratory study into Dutch aspect-based sentiment analysis

    Get PDF
    The fine-grained task of automatically detecting all sentiment expressions within a given document and the aspects to which they refer is known as aspect-based sentiment analysis. In this paper we present the first full aspect-based sentiment analysis pipeline for Dutch and apply it to customer reviews. To this purpose, we collected reviews from two different domains, i.e. restaurant and smartphone reviews. Both corpora have been manually annotated using newly developed guidelines that comply to standard practices in the field. For our experimental pipeline we perceive aspect-based sentiment analysis as a task consisting of three main subtasks which have to be tackled incrementally: aspect term extraction, aspect category classification and polarity classification. First experiments on our Dutch restaurant corpus reveal that this is indeed a feasible approach that yields promising results

    Parsing Argumentation Structures in Persuasive Essays

    Full text link
    In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves the performance of base classifiers and significantly outperforms challenging heuristic baselines. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement. This corpus and the annotation guidelines are freely available for ensuring reproducibility and to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26 October 2015. Revised submission: 15 July 201
    corecore