3,076 research outputs found

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    Neural End-to-End Learning for Computational Argumentation Mining

    Full text link
    We investigate neural techniques for end-to-end computational argumentation mining (AM). We frame AM both as a token-based dependency parsing and as a token-based sequence tagging problem, including a multi-task learning setup. Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results. In contrast, less complex (local) tagging models based on BiLSTMs perform robustly across classification scenarios, being able to catch long-range dependencies inherent to the AM problem. Moreover, we find that jointly learning 'natural' subtasks, in a multi-task learning setup, improves performance.Comment: To be published at ACL 201

    Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

    Full text link
    Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201

    Analyzing collaborative learning processes automatically

    Get PDF
    In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in

    Annotating topics, stance, argumentativeness and claims in Dutch social media comments : a pilot study

    Get PDF
    One of the major challenges currently facing the field of argumentation mining is the lack of consensus on how to analyse argumentative user-generated texts such as online comments. The theoretical motivations underlying the annotation guidelines used to generate labelled corpora rarely include motivation for the use of a particular theoretical basis. This pilot study reports on the annotation of a corpus of 100 Dutch user comments made in response to politically-themed news articles on Facebook. The annotation covers topic and aspect labelling, stance labelling, argumentativeness detection and claim identification. Our IAA study reports substantial agreement scores for argumentativeness detection (0.76 Fleiss’ kappa) and moderate agreement for claim labelling (0.45 Fleiss’ kappa). We provide a clear justification of the theories and definitions underlying the design of our guidelines. Our analysis of the annotations signal the importance of adjusting our guidelines to include allowances for missing context information and defining the concept of argumentativeness in connection with stance. Our annotated corpus and associated guidelines are made publicly available

    The written production of argumentative and dissertation text: a didactic project based on Bakhtin's philosophy

    Get PDF
    This article is characterized as a theoretical and practical research related to a project developed during the year 2015 in two schools - one public, the other private- , in the city of Birigui, State of São Paulo, Brazil. The main goal of the study was to analyze: the pedagogical project and the teachers’ activities oriented towards teaching and learning of argumentative and dissertation texts. The methodology used in this research comprised: (i) visiting both schools, (ii) producing a description of ongoing school practices, with focus on the teaching of argumentative and dissertation texts, (iii) suggesting a teacher’s activity to improve this apprenticeship, (iv) putting this activity into practice and (v) analyzing obtained results. The theoretical framework used for this study was the Bakhtinian philosophy (BAKHTIN, 2013; 2006a; 2006b; 2006c; 2006d; 2010; 2013; VOLOSHINOV, 1986). This theoretical approach was chosen due to the importance of comprehension of the text not just as an amalgamated set of words, phrases and paragraphs; other than that, we understand it as a structure of meaning, in which we encounter linguistic forms, ideologies and discursive stance. At last, we can say the results show that the argumentative texts render assistance to the development of the students’ argumentative competence and skill, that is, in their ability to argue and organize ideas in a communicative situation

    An Exploratory Application of Rhetorical Structure Theory to Detect Coherence Errors in L2 English Writing: Possible Implications for Automated Writing Evaluation Software

    Get PDF
    This paper presents an initial attempt to examine whether Rhetorical Structure Theory (RST) (Mann & Thompson, 1988) can be fruitfully applied to the detection of the coherence errors made by Taiwanese low-intermediate learners of English. This investigation is considered warranted for three reasons. First, other methods for bottom-up coherence analysis have proved ineffective (e.g., Watson Todd et al., 2007). Second, this research provides a preliminary categorization of the coherence errors made by first language (L1) Chinese learners of English. Third, second language discourse errors in general have received little attention in applied linguistic research. The data are 45 written samples from the LTTC English Learner Corpus, a Taiwanese learner corpus of English currently under construction. The rationale of this study is that diagrams which violate some of the rules of RST diagram formation will point to coherence errors. No reliability test has been conducted since this work is at an initial stage. Therefore, this study is exploratory and results are preliminary. Results are discussed in terms of the practicality of using this method to detect coherence errors, their possible consequences about claims for a typical inductive content order in the writing of L1 Chinese learners of English, and their potential implications for Automated Writing Evaluation (AWE) software, since discourse organization is one of the essay characteristics assessed by this software. In particular, the extent to which the kinds of errors detected through the RST analysis match those located by Criterion (Burstein, Chodorow, & Leachock, 2004), a well-known AWE software by Educational Testing Service (ETS), is discussed

    Detección de la unidad central en dos géneros y lenguajes diferentes: un estudio preliminar en portugués brasileño y euskera

    Get PDF
    The aim of this paper is to present the development of a rule-based automatic detector which determines the main idea or the most pertinent discourse unit in two different languages such as Basque and Brazilian Portuguese and in two distinct genres such as scientific abstracts and argumentative answers. The central unit (CU) may be of interest to understand texts regarding relational discourse structure and it can be applied to Natural Language Processing (NLP) tasks such as automatic summarization, question-answer systems or sentiment analysis. In the case of argumentative answer genre, the identification of CU is an essential step for an eventual implementation of an automatic evaluator for this genre. The theoretical background which underlies the paper is Mann and Thompson’s (1988) Rhetorical Structure Theory (RST), following discourse segmentation and CU annotation. Results show that the CUs in different languages and in different genres are detected automatically with similar results, although there is space for improvement.El objetivo de este trabajo es presentar las mejoras de un detector automático basado en reglas que determina la idea principal o unidad discursiva más pertinente de dos lenguas tan diferentes como el euskera y el portugués de Brasil y en dos géneros muy distintos como son los resúmenes de los artículos científicos y las respuestas argumentativas. La unidad central (CU, por sus siglas en inglés) puede ser de interés para entender los textos partiendo de la estructura discursiva relacional y poderlo aplicar en tareas de Procesamiento del Lenguaje Natural (PLN) tales como resumen automático, sistemas de pregunta-respuesta o análisis de sentimiento. En los textos de respuesta argumentativa, identificar la CU es un paso esencial para un evaluador automático de considere la estructura discursiva de dichos textos. El marco teórico en el que hemos desarrollado el trabajo es la Rhetorical Structure Theory (RST) de Mann y Thompson (1988), que parte de la segmentación discursiva y finaliza con la anotación de la unidad central. Los resultados demuestran que las unidades centrales en diferentes lenguas y géneros son detectadas con similares resultados automáticamente, aunque todavía hay espacio para mejora
    corecore