2,911 research outputs found
Document controversy classification based on the Wikipedia category structure
Dispute and controversy are parts of our culture and cannot be omitted on the Internet (where it becomes more anonymous). There have been many studies on controversy, especially on social networks such as Wikipedia. This free on-line encyclopedia has become a very popular data source among many researchers studying behavior or natural language processing. This paper presents using the category structure of Wikipedia to determine the controversy of a single article. This is the first part of the proposed system for classification of topic controversy score for any given text
Joint RNN Model for Argument Component Boundary Detection
Argument Component Boundary Detection (ACBD) is an important sub-task in
argumentation mining; it aims at identifying the word sequences that constitute
argument components, and is usually considered as the first sub-task in the
argumentation mining pipeline. Existing ACBD methods heavily depend on
task-specific knowledge, and require considerable human efforts on
feature-engineering. To tackle these problems, in this work, we formulate ACBD
as a sequence labeling problem and propose a variety of Recurrent Neural
Network (RNN) based methods, which do not use domain specific or handcrafted
features beyond the relative position of the sentence in the document. In
particular, we propose a novel joint RNN model that can predict whether
sentences are argumentative or not, and use the predicted results to more
precisely detect the argument component boundaries. We evaluate our techniques
on two corpora from two different genres; results suggest that our joint RNN
model obtain the state-of-the-art performance on both datasets.Comment: 6 pages, 3 figures, submitted to IEEE SMC 201
MythQA: Query-Based Large-Scale Check-Worthy Claim Detection through Multi-Answer Open-Domain Question Answering
Check-worthy claim detection aims at providing plausible misinformation to
downstream fact-checking systems or human experts to check. This is a crucial
step toward accelerating the fact-checking process. Many efforts have been put
into how to identify check-worthy claims from a small scale of pre-collected
claims, but how to efficiently detect check-worthy claims directly from a
large-scale information source, such as Twitter, remains underexplored. To fill
this gap, we introduce MythQA, a new multi-answer open-domain question
answering(QA) task that involves contradictory stance mining for query-based
large-scale check-worthy claim detection. The idea behind this is that
contradictory claims are a strong indicator of misinformation that merits
scrutiny by the appropriate authorities. To study this task, we construct
TweetMythQA, an evaluation dataset containing 522 factoid multi-answer
questions based on controversial topics. Each question is annotated with
multiple answers. Moreover, we collect relevant tweets for each distinct
answer, then classify them into three categories: "Supporting", "Refuting", and
"Neutral". In total, we annotated 5.3K tweets. Contradictory evidence is
collected for all answers in the dataset. Finally, we present a baseline system
for MythQA and evaluate existing NLP models for each system component using the
TweetMythQA dataset. We provide initial benchmarks and identify key challenges
for future models to improve upon. Code and data are available at:
https://github.com/TonyBY/Myth-QAComment: Accepted by SIGIR 202
Recommended from our members
Controversy Analysis and Detection
Seeking information on a controversial topic is often a complex task. Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the filter bubble effect, and therefore would be a useful feature in a search engine or browser extension. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results beyond a simple list of 10 links . This thesis has made strides in the emerging niche of controversy detection and analysis. The body of work in this thesis revolves around two themes: computational models of controversy, and controversies occurring in neighborhoods of topics. Our broad contributions are: (1) Presenting a theoretical framework for modeling controversy as contention among populations; (2) Constructing the first automated approach to detecting controversy on the web, using a KNN classifier that maps from the web to similar Wikipedia articles; and (3) Proposing a novel controversy detection in Wikipedia by employing a stacked model using a combination of link structure and similarity. We conclude this work by discussing the challenging technical, societal and ethical implications of this emerging research area and proposing avenues for future work
Corpus Wide Argument Mining -- a Working Solution
One of the main tasks in argument mining is the retrieval of argumentative
content pertaining to a given topic. Most previous work addressed this task by
retrieving a relatively small number of relevant documents as the initial
source for such content. This line of research yielded moderate success, which
is of limited use in a real-world system. Furthermore, for such a system to
yield a comprehensive set of relevant arguments, over a wide range of topics,
it requires leveraging a large and diverse corpus in an appropriate manner.
Here we present a first end-to-end high-precision, corpus-wide argument mining
system. This is made possible by combining sentence-level queries over an
appropriate indexing of a very large corpus of newspaper articles, with an
iterative annotation scheme. This scheme addresses the inherent label bias in
the data and pinpoints the regions of the sample space whose manual labeling is
required to obtain high-precision among top-ranked candidates
A Wikipedia Literature Review
This paper was originally designed as a literature review for a doctoral
dissertation focusing on Wikipedia. This exposition gives the structure of
Wikipedia and the latest trends in Wikipedia research
- …