544 research outputs found
Argumentation models and their use in corpus annotation: practice, prospects, and challenges
The study of argumentation is transversal to several research domains, from philosophy to linguistics, from the law to computer science and artificial intelligence. In discourse analysis, several distinct models have been proposed to harness argumentation, each with a different focus or aim. To analyze the use of argumentation in natural language, several corpora annotation efforts have been carried out, with a more or less explicit grounding on one of such theoretical argumentation models. In fact, given the recent growing interest in argument mining applications, argument-annotated corpora are crucial to train machine learning models in a supervised way. However, the proliferation of such corpora has led to a wide disparity in the granularity of the argument annotations employed. In this paper, we review the most relevant theoretical argumentation models, after which we survey argument annotation projects closely following those theoretical models. We also highlight the main simplifications that are often introduced in practice. Furthermore, we glimpse other annotation efforts that are not so theoretically grounded but instead follow a shallower approach. It turns out that most argument annotation projects make their own assumptions and simplifications, both in terms of the textual genre they focus on and in terms of adapting the adopted theoretical argumentation model for their own agenda. Issues of compatibility among argument-annotated corpora are discussed by looking at the problem from a syntactical, semantic, and practical perspective. Finally, we discuss current and prospective applications of models that take advantage of argument-annotated corpora
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
Analyzing the Persuasive Effect of Style in News Editorial Argumentation
News editorials argue about political issues in order to challenge or reinforce the stance of readers with different ideologies. Previous research has investigated such persuasive effects for argumentative content. In contrast, this paper studies how important the style of news editorials is to achieve persuasion. To this end, we first compare content- and style-oriented classifiers on editorials from the liberal NYTimes with ideology-specific effect annotations. We find that conservative readers are resistant to NYTimes style, but on liberals, style even has more impact than content. Focusing on liberals, we then cluster the leads, bodies, and endings of editorials, in order to learn about writing style patterns of effective argumentation
Discourse Relations and Evaluation
We examine the role of discourse relations (relations between propositions) in the interpretation of evaluative or opinion words. Through a combination of Rhetorical Structure Theory or RST (Mann & Thompson, 1988) and Appraisal Theory (Martin & White, 2005), we analyze how different discourse relations modify the evaluative content of opinion words, and what impact the nucleus-satellite structure in RST has on the evaluation. We conduct a corpus study, examining and annotating over 3,000 evaluative words in 50 movie reviews in the SFU Review Corpus (Taboada, 2008) with respect to five parameters: word category (nouns, verbs, adjectives or adverbs), prior polarity (positive, negative or neutral), RST structure (both nucleus-satellite status and relation type) and change of polarity as a result of being part of a discourse relation (Intensify, Downtone, Reversal or No Change). Results show that relations such as Concession, Elaboration, Evaluation, Evidence and Restatement most frequently intensify the polarity of the opinion words, although the majority of evaluative words (about 70%) do not undergo changes in their polarity because of the relations they are a part of. We also find that most opinion words (about 70%) are positioned in the nucleus, confirming a hypothesis in the literature, that nuclei are the most important units when extracting evaluation automatically
Recommended from our members
Computational Models of Argument Structure and Argument Quality for Understanding Misinformation
With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that can find checkworthy information, detect fallacious argumentation of online content, retrieve relevant evidence from authoritative sources and analyze the veracity of claims given the retrieved evidence. The robustness and applicability of these systems depend on the availability of annotated resources to train machine learning models in a supervised fashion, as well as machine learning models that capture patterns beyond domain-specific lexical clues or genre-specific stylistic insights. In this thesis, we investigate the role of models for argument structure and argument quality in improving tasks relevant to fact-checking and furthering our understanding of misinformation and disinformation. We contribute to argumentation mining, misinformation detection, and fact-checking by releasing multiple annotated datasets, developing unified models across datasets and task formulations, and analyzing the vulnerabilities of such models in adversarial settings.
We start by studying the argument structure's role in two downstream tasks related to fact-checking. As it is essential to differentiate factual knowledge from opinionated text, we develop a model for detecting the type of news articles (factual or opinionated) using highly transferable argumentation-based features. We also show the potential of argumentation features to predict the checkworthiness of information in news articles and provide the first multi-layer annotated corpus for argumentation and fact-checking.
We then study qualitative aspects of arguments through models for fallacy recognition. To understand the reasoning behind checkworthiness and the relation of argumentative fallacies to fake content, we develop an annotation scheme of fallacies in fact-checked content and investigate avenues for automating the detection of such fallacies considering single- and multi-dataset training. Using instruction-based prompting, we introduce a unified model for recognizing twenty-eight fallacies across five fallacy datasets. We also use this model to explain the checkworthiness of statements in two domains.
Next, we show our models for end-to-end fact-checking of statements that include finding the relevant evidence document and sentence from a collection of documents and then predicting the veracity of the given statements using the retrieved evidence. We also analyze the robustness of end-to-end fact extraction and verification by generating adversarial statements and addressing areas for improvements for models under adversarial attacks. Finally, we show that evidence-based verification is essential for fine-grained claim verification by modeling the human-provided justifications with the gold veracity labels
La estructuración temática en inglés y español: anotación contrastiva de un corpus bilingüe para aplicaciones lingüísticas y computacionales
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Filología, Departamento de Filología Inglesa, leída el 04-12-2015Thematization is recognized as a fundamental phenomenon in the construction of messages and texts by di erent linguistic schools. This location within a text privileges the elements that guide the reader in the orientation and interpretation of discourse at di erent levels. Thematizing a linguistic unit by locating it in the rst-initial position of a clause, paragraph, or text, confers upon it a special status: a signal of the organizational strategy which characterizes di erent text types playing a role as a variable in the distinction of registers, text types and genres. However, in spite of the importance of the study of thematization for message and textual structuring, to date there are no linguistic studies that have undertook the task of validating its aspects in a comparative manner, either for linguistic or computational purposes. This study, therefore, lls a research gap by implementing a methodology based on contrastive corpus annotation, which allows to empirically validate aspects of the phenomenon of Thematization in English and Spanish, it also seeks to develop a bilingual English-Spanish comparable corpus of newspaper texts automatically annotated with thematic features at clausal and discourse levels. The empirically validated categories (Thematic Field and its elements: Textual Theme, Interpersonal Theme, PreHead and Head) are used to annotate a larger corpus of three newspaper genres news reports, editorials and letters to the editor in terms of thematic choices. This characterization, reveals interesting results, such as the use of genre-speci c strategies in thematic position. In addition, the thesis investigates the possibility to automate the annotation of thematic features in the bilingual corpus through the development of a set of JAVA rules implemented in GATE. It also shows the e cacy of this method in comparison with the manual annotation results...Depto. de Estudios Ingleses: Lingüística y LiteraturaFac. de FilologíaTRUEunpu
- …