29 research outputs found
Recommended from our members
Content Selection for Effective Counter-Argument Generation
The information ecosystem of social media has resulted in an abundance of opinions on political topics and current events. In order to encourage better discussions, it is important to promote high-quality responses and relegate low-quality ones.
We thus focus on automatically analyzing and generating counter-arguments in response to posts on social media with the goal of providing effective responses.
This thesis is composed of three parts. In the first part, we conduct an analysis of arguments. Specifically, we first annotate discussions from Reddit for aspects of arguments and then analyze them for their persuasive impact. Then we present approaches to identify the argumentative structure of these discussions and predict the persuasiveness of an argument. We evaluate each component independently using automatic or manual evaluations and show significant improvement in each.
In the second part, we leverage our discoveries from our analysis in the process of generating counter-arguments. We develop two approaches in the retrieve-and-edit framework, where we obtain content using methods created during our analysis of arguments, among others, and then modify the content using techniques from natural language generation. In the first approach, we develop an approach to retrieve counter-arguments by annotating a dataset for stance and building models for stance prediction. Then we use our approaches from our analysis of arguments to extract persuasive argumentative content before modifying non-content phrases for coherence. In contrast, in the second approach we create a dataset and models for modifying content -- making semantic edits to a claim to have a contrasting stance. We evaluate our approaches using intrinsic automatic evaluation of our predictive models and an overall human evaluation of our generated output.
Finally, in the third part, we discuss the semantic challenges of argumentation that we need to solve in order to make progress in the understanding of arguments. To clarify, we develop new methods for identifying two types of semantic relations -- causality and veracity. For causality, we build a distant-labeled dataset of causal relations using lexical indicators and then we leverage features from those indicators to build predictive models. For veracity, we build new models to retrieve evidence given a claim and predict whether the claim is supported by that evidence. We also develop a new dataset for veracity to illuminate the areas that need progress. We evaluate these approaches using automated and manual techniques and obtain significant improvement over strong baselines.
Finally, we apply these techniques to claims in the domain of household electricity consumption, mining claims using our methods for causal relations and then verifying their truthfulness
On the Role of Images for Analyzing Claims in Social Media
Fake news is a severe problem in social media. In this paper, we present an empirical study on visual, textual, and multimodal models for the tasks of claim, claim check-worthiness, and conspiracy detection, all of which are related to fake news detection. Recent work suggests that images are more influential than text and often appear alongside fake text. To this end, several multimodal models have been proposed in recent years that use images along with text to detect fake news on social media sites like Twitter. However, the role of images is not well understood for claim detection, specifically using transformer-based textual and multimodal models. We investigate state-of-the-art models for images, text (Transformer-based), and multimodal information for four different datasets across two languages to understand the role of images in the task of claim and conspiracy detection
Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans
A significant increase in content creation and information exchange has been
made possible by the quick development of online social media platforms, which
has been very advantageous. However, these platforms have also become a haven
for those who disseminate false information, propaganda, and fake news. Claims
are essential in forming our perceptions of the world, but sadly, they are
frequently used to trick people by those who spread false information. To
address this problem, social media giants employ content moderators to filter
out fake news from the actual world. However, the sheer volume of information
makes it difficult to identify fake news effectively. Therefore, it has become
crucial to automatically identify social media posts that make such claims,
check their veracity, and differentiate between credible and false claims. In
response, we presented CLAIMSCAN in the 2023 Forum for Information Retrieval
Evaluation (FIRE'2023). The primary objectives centered on two crucial tasks:
Task A, determining whether a social media post constitutes a claim, and Task
B, precisely identifying the words or phrases within the post that form the
claim. Task A received 40 registrations, demonstrating a strong interest and
engagement in this timely challenge. Meanwhile, Task B attracted participation
from 28 teams, highlighting its significance in the digital era of
misinformation
Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media
Claim span identification (CSI) is an important step in fact-checking
pipelines, aiming to identify text segments that contain a checkworthy claim or
assertion in a social media post. Despite its importance to journalists and
human fact-checkers, it remains a severely understudied problem, and the scarce
research on this topic so far has only focused on English. Here we aim to
bridge this gap by creating a novel dataset, X-CLAIM, consisting of 7K
real-world claims collected from numerous social media platforms in five Indian
languages and English. We report strong baselines with state-of-the-art
encoder-only language models (e.g., XLM-R) and we demonstrate the benefits of
training on multiple languages over alternative cross-lingual transfer methods
such as zero-shot transfer, or training on translated data, from a
high-resource language such as English. We evaluate generative large language
models from the GPT series using prompting methods on the X-CLAIM dataset and
we find that they underperform the smaller encoder-only language models for
low-resource languages.Comment: EMNLP 2023 (main