57 research outputs found
Understanding misinformation on Twitter in the context of controversial issues
Social media is slowly supplementing, or even replacing, traditional media outlets such as television, newspapers, and radio. However, social media presents some drawbacks when it comes to circulating information. These drawbacks include spreading false information, rumors, and fake news. At least three main factors create these drawbacks: The filter bubble effect, misinformation, and information overload. These factors make gathering accurate and credible information online very challenging, which in turn may affect public trust in online information. These issues are even more challenging when the issue under discussion is a controversial topic. In this thesis, four main controversial topics are studied, each of which comes from a different domain. This variation of domains can give a broad view of how misinformation is manifested in social media, and how it is manifested differently in different domains.
This thesis aims to understand misinformation in the context of controversial issue discussions. This can be done through understanding how misinformation is manifested in social media as well as by understanding people’s opinions towards these controversial issues. In this thesis, three different aspects of a tweet are studied. These aspects are 1) the user sharing the information, 2) the information source shared, and 3) whether specific linguistic cues can help in assessing the credibility of information on social media. Finally, the web application tool TweetChecker is used to allow online users to have a more in-depth understanding of the discussions about five different controversial health issues. The results and recommendations of this study can be used to build solutions for the problem of trustworthiness of user-generated content on different social media platforms, especially for controversial issues
A Clustering Algorithm for Early Prediction of Controversial Reddit Posts
Social curation platforms like Reddit are rich with user interactions such as comments, upvotes, and downvotes. Predicting these interactions before they happen is an interesting computational challenge and can be used for a variety of tasks, ranging from content moderation to personality prediction. Given the vast amount of information posted on these sites, it\u27s important to develop models that can simplify this prediction task. In this paper, we present a simple clustering algorithm that helps predict the controversiality of a Reddit post using the user\u27s profile information, their past contributions on Reddit, and the sentiment expressed in their post. On average, introducing the cluster to the prediction task improved the accuracy of the prediction by over 20 percent, with F1 scores of 0.95 (micro) and 0.7 (macro). The classifier performs better than a majority predictor. The results also show that the overwhelming majority of users are inactive and when they do post, they post non-controversial content
Capturing stance dynamics in social media: open challenges and research directions
Social media platforms provide a goldmine for mining public opinion on issues
of wide societal interest and impact. Opinion mining is a problem that can be
operationalised by capturing and aggregating the stance of individual social
media posts as supporting, opposing or being neutral towards the issue at hand.
While most prior work in stance detection has investigated datasets that cover
short periods of time, interest in investigating longitudinal datasets has
recently increased. Evolving dynamics in linguistic and behavioural patterns
observed in new data require adapting stance detection systems to deal with the
changes. In this survey paper, we investigate the intersection between
computational linguistics and the temporal evolution of human communication in
digital media. We perform a critical review of emerging research considering
dynamics, exploring different semantic and pragmatic factors that impact
linguistic data in general, and stance in particular. We further discuss
current directions in capturing stance dynamics in social media. We discuss the
challenges encountered when dealing with stance dynamics, identify open
challenges and discuss future directions in three key dimensions: utterance,
context and influence
Adult readers evaluating the credibility of social media posts: Prior belief consistency and source's expertise matter most
The present study investigates the role of source characteristics, the
quality of evidence, and prior beliefs of the topic in adult readers'
credibility evaluations of short health-related social media posts. The
researchers designed content for the posts concerning five health topics by
manipulating the source characteristics (source's expertise, gender, and
ethnicity), the accuracy of the claims, and the quality of evidence (research
evidence, testimony, consensus, and personal experience) of the posts. After
this, accurate and inaccurate social media posts varying in the other
manipulated aspects were programmatically generated. The crowdworkers (N = 844)
recruited from two platforms were asked to evaluate the credibility of up to
ten social media posts, resulting in 8380 evaluations. Before credibility
evaluation, participants' prior beliefs on the topics of the posts were
assessed. The results showed that prior belief consistency and the source's
expertise affected the perceived credibility of the accurate and inaccurate
social media posts the most after controlling for the topic of the post and the
crowdworking platform. In contrast, the quality of evidence supporting the
health claim mattered relatively little. The source's gender and ethnicity did
not have any effect. The results are discussed in terms of first- and
second-hand evaluation strategies.Comment: 16 pages, 4 figures including the appendix. Submitted to a journal
for peer revie
Stance classification of Twitter debates: The encryption debate as a use case
Social media have enabled a revolution in user-generated
content. They allow users to connect, build community, produce
and share content, and publish opinions. To better understand
online users’ attitudes and opinions, we use stance classification.
Stance classification is a relatively new and challenging
approach to deepen opinion mining by classifying a user's stance
in a debate. Our stance classification use case is tweets that were
related to the spring 2016 debate over the FBI’s request that
Apple decrypt a user’s iPhone. In this “encryption debate,”
public opinion was polarized between advocates for individual
privacy and advocates for national security. We propose a
machine learning approach to classify stance in the debate, and a
topic classification that uses lexical, syntactic, Twitter-specific,
and argumentative features as a predictor for classifications.
Models trained on these feature sets showed significant
increases in accuracy relative to the unigram baseline.Ope
Sketching the vision of the Web of Debates
The exchange of comments, opinions, and arguments in blogs, forums, social media, wikis, and review websites has transformed the Web into a modern agora, a virtual place where all types of debates take place. This wealth of information remains mostly unexploited: due to its textual form, such information is difficult to automatically process and analyse in order to validate, evaluate, compare, combine with other types of information and make it actionable. Recent research in Machine Learning, Natural Language Processing, and Computational Argumentation has provided some solutions, which still cannot fully capture important aspects of online debates, such as various forms of unsound reasoning, arguments that do not follow a standard structure, information that is not explicitly expressed, and non-logical argumentation methods. Tackling these challenges would give immense added-value, as it would allow searching for, navigating through and analyzing online opinions and arguments, obtaining a better picture of the various debates for a well-intentioned user. Ultimately, it may lead to increased participation of Web users in democratic, dialogical interchange of arguments, more informed decisions by professionals and decision-makers, as well as to an easier identification of biased, misleading, or deceptive arguments. This paper presents the vision of the Web of Debates, a more human-centered version of the Web, which aims to unlock the potential of the abundance of argumentative information that currently exists online, offering its users a new generation of argument-based web services and tools that are tailored to their real needs
Credibility analysis of textual claims with explainable evidence
Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources.
We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.Das Web ist eine riesige Quelle wertvoller Informationen, allerdings wurde es durch die Verbreitung von Falschmeldungen verschmutzt. Eine zunehmende Anzahl an Hoaxes, Falschmeldungen und irreführenden Informationen im Internet haben viele Websites hervorgebracht, auf denen die Fakten überprüft und zweifelhafte Behauptungen manuell bewertet werden. Die rasante Verbreitung großer Mengen von Fehlinformationen sind jedoch zum Engpass für die manuelle Überprüfung geworden. Dies erfordert Tools zur Bewertung der Glaubwürdigkeit, mit denen dieser Überprüfungsprozess automatisiert werden kann. In früheren Arbeiten in diesem Bereich werden starke Annahmen gemacht über die Struktur der Behauptungen und die Portale, in denen sie gepostet werden. Vor allem aber können die Black-Box-Techniken, die in früheren Arbeiten vorgeschlagen wurden, nicht erklären, warum eine bestimmte Aussage als glaubwürdig erachtet wird oder nicht. Um diesen Einschränkungen zu begegnen, wird in dieser Dissertation ein allgemeines Framework für die automatisierte Bewertung der Glaubwürdigkeit vorgeschlagen, bei dem keine Annahmen über die Struktur oder den Ursprung der Behauptungen gemacht werden. Insbesondere schlagen wir ein featurebasiertes Modell vor, das automatisch relevante Artikel zu einer bestimmten Behauptung abruft und deren Glaubwürdigkeit bewertet, indem die gegenseitige Interaktion zwischen dem Sprachstil der relevanten Artikel, ihre Haltung zur Behauptung und der Vertrauenswürdigkeit der zugrunde liegenden Quellen erfasst wird. Wir verbessern unseren Ansatz zur Bewertung der Glaubwürdigkeit weiter und schlagen ein auf neuronalen Netzen basierendes Modell vor. Im Gegensatz zum featurebasierten Modell ist dieses Modell nicht auf Feature-Engineering und externe Lexika angewiesen. Unsere beiden Modelle machen ihre Einschätzungen interpretierbar, indem sie erklärbare Beweise aus sorgfältig ausgewählten Webquellen extrahieren. Wir verwenden unsere Modelle zur Entwicklung eines Webinterfaces, CredEye, mit dem Benutzer die Glaubwürdigkeit einer Behauptung in Textform automatisch bewerten und verstehen können, indem sie automatisch ausgewählte Beweisstücke einsehen. Darüber hinaus untersuchen wir das Problem der Positionsklassifizierung und schlagen ein auf neuronalen Netzen basierendes Modell vor, um die Position verschiedener Benutzerperspektiven in Bezug auf die umstrittenen Behauptungen vorherzusagen. Bei einer kontroversen Behauptung und einem Benutzerkommentar sagt unser Einstufungsmodell voraus, ob der Benutzerkommentar die Behauptung unterstützt oder ablehnt
Annotating Student Talk in Text-based Classroom Discussions
Classroom discussions in English Language Arts have a positive effect on
students' reading, writing and reasoning skills. Although prior work has
largely focused on teacher talk and student-teacher interactions, we focus on
three theoretically-motivated aspects of high-quality student talk:
argumentation, specificity, and knowledge domain. We introduce an annotation
scheme, then show that the scheme can be used to produce reliable annotations
and that the annotations are predictive of discussion quality. We also
highlight opportunities provided by our scheme for education and natural
language processing research
- …