1,718 research outputs found
Mining arguments in scientific abstracts: Application to argumentative quality assessment
Argument mining consists in the automatic identification of argumentative structures in natural language, a task that has been recognized as particularly challenging in the scientific domain. In this work we propose SciARG, a new annotation scheme, and apply it to the identification of argumentative units and relations in abstracts in two scientific disciplines: computational linguistics and biomedicine, which allows us to assess the applicability of our scheme to different knowledge fields. We use our annotated corpus to train and evaluate argument mining models in various experimental settings, including single and multi-task learning. We investigate the possibility of leveraging existing annotations, including discourse relations and rhetorical roles of sentences, to improve the performance of argument mining models. In particular, we explore the potential offered by a sequential transfer- learning approach in which supplementary training tasks are used to fine-tune pre-trained parameter-rich language models. Finally, we analyze the practical usability of the automatically-extracted components and relations for the prediction of argumentative quality dimensions of scientific abstracts.Agencia Nacional de Investigación e InnovaciónMinisterio de Economía, Industria y Competitividad (España
General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries
In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We then use the classifier to analyze the distribution of general and specific sentences in human and machine summaries of news articles. We discover that while all types of summaries tend to be more specific than the original documents, human abstracts contain a more balanced mix of general and specific sentences but automatic summaries are overwhelmingly specific. Our findings give strong evidence for the need for a new task in (abstractive) summarization: identification and generation of general sentences
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
Domain Agnostic Real-Valued Specificity Prediction
Sentence specificity quantifies the level of detail in a sentence,
characterizing the organization of information in discourse. While this
information is useful for many downstream applications, specificity prediction
systems predict very coarse labels (binary or ternary) and are trained on and
tailored toward specific domains (e.g., news). The goal of this work is to
generalize specificity prediction to domains where no labeled data is available
and output more nuanced real-valued specificity ratings.
We present an unsupervised domain adaptation system for sentence specificity
prediction, specifically designed to output real-valued estimates from binary
training labels. To calibrate the values of these predictions appropriately, we
regularize the posterior distribution of the labels towards a reference
distribution. We show that our framework generalizes well to three different
domains with 50%~68% mean absolute error reduction than the current
state-of-the-art system trained for news sentence specificity. We also
demonstrate the potential of our work in improving the quality and
informativeness of dialogue generation systems.Comment: AAAI 2019 camera read
- …