Search CORE

831 research outputs found

Crowdsourcing Argumentation Structures in Chinese Hotel Reviews

Author: Gao Yang
Geng Shiqiang
Li Mengxue
Liu Haijing
Wang Hao
Publication venue
Publication date: 04/05/2017
Field of study

Argumentation mining aims at automatically extracting the premises-claim discourse structures in natural language texts. There is a great demand for argumentation corpora for customer reviews. However, due to the controversial nature of the argumentation annotation task, there exist very few large-scale argumentation corpora for customer reviews. In this work, we novelly use the crowdsourcing technique to collect argumentation annotations in Chinese hotel reviews. As the first Chinese argumentation dataset, our corpus includes 4814 argument component annotations and 411 argument relation annotations, and its annotations qualities are comparable to some widely used argumentation corpora in other languages.Comment: 6 pages,3 figures,This article has been submitted to "The 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC2017)

arXiv.org e-Print Archive

Crossref

Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations

Author: Demberg Vera
Scholman Merel C.J.
Publication venue: University of Illinois at Chicago Library
Publication date: 19/07/2017
Field of study

Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate/specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.&nbsp

University of Illinois at Chicago: Journals@UIC

Dialogue & Discourse (E-Journal - Universität Bielefeld)

Automated Mining of Leaderboards for Empirical AI Research

Author: Auer Sören
D’Souza Jennifer
Kabongo Salomon
Ke Hao-Ren
Lee Chei Sian
Sugiyama Kazunari
Publication venue: New York, NY : Springer
Publication date: 01/01/2021
Field of study

With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining. This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs

Institutionelles Repositorium der Leibniz Universität Hannover

TA-COS 2016 : First workshop on text analytics for cybersecurity and online safety : Proceedings

Author: De Pauw Guy
Desmet Bart
Lefever Els
Verhoeven Ben
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Annotating Argument Schemes

Author: Lawrence John
Reed Chris
Visser Jacky
Wagemans Jean
Walton Douglas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

University of Dundee Online Publications

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

Author: Dukes Kais
Publication venue: University of Leeds
Publication date: 01/09/2013
Field of study

Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

White Rose E-theses Online

Resources and benchmark corpora for hate speech detection: a systematic review

Author: Basile Valerio
Bosco Cristina
Patti Viviana
Poletto Fabio
Sanguinetti Manuela
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

Author: Dagan Ido
Demberg Vera
Pyatkin Valentina
Scholman Merel C. J.
Tsarfaty Reut
Yung Frances
Publication venue
Publication date: 01/01/2023
Field of study

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias—task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of lay annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations’ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relation senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models

Utrecht University Repository