703 research outputs found

    Microfilm and Microfacsimile Publications

    Get PDF
    published or submitted for publicatio

    Parsing Argumentation Structures in Persuasive Essays

    Full text link
    In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves the performance of base classifiers and significantly outperforms challenging heuristic baselines. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement. This corpus and the annotation guidelines are freely available for ensuring reproducibility and to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26 October 2015. Revised submission: 15 July 201

    Normalization of Dutch user-generated content

    Get PDF
    Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

    On the performance of phonetic algorithms in microtext normalization

    Get PDF
    User–generated content published on microblogging social networks constitutes a priceless source of information. However, microtexts usually deviate from the standard lexical and grammatical rules of the language, thus making its processing by traditional intelligent systems very difficult. As an answer, microtext normalization consists in transforming those non–standard microtexts into standard well–written texts as a preprocessing step, allowing traditional approaches to continue with their usual processing. Given the importance of phonetic phenomena in non–standard text formation, an essential element of the knowledge base of a normalizer would be the phonetic rules that encode these phenomena, which can be found in the so–called phonetic algorithms. In this work we experiment with a wide range of phonetic algorithms for the English language. The aim of this study is to determine the best phonetic algorithms within the context of candidate generation for microtext normalization. In other words, we intend to find those algorithms that taking as input non–standard terms to be normalized allow us to obtain as output the smallest possible sets of normalization candidates which still contain the corresponding target standard words. As it will be stated, the choice of the phonetic algorithm will depend heavily on the capabilities of the candidate selection mechanism which we usually find at the end of a microtext normalization pipeline. The faster it can make the right choices among big enough sets of candidates, the more we can sacrifice on the precision of the phonetic algorithms in favour of coverage in order to increase the overall performance of the normalization systemAgencia Estatal de Investigación | Ref. TIN2017-85160-C2-1-RAgencia Estatal de Investigación | Ref. TIN2017-85160-C2-2-RMinisterio de Economía y Competitividad | Ref. FFI2014-51978-C2-1-RMinisterio de Economía y Competitividad | Ref. FFI2014-51978-C2-2-RXunta de Galicia | Ref. ED431D-2017/12Xunta de Galicia | Ref. ED431B2017/01Xunta de Galicia | Ref. ED431D R2016/046Ministerio de Economía y Competitividad | Ref. BES-2015-07376
    • …
    corecore