752 research outputs found
Simple open stance classification for rumour analysis
Stance classification determines the attitude, or stance, in a (typically short) text. The task has powerful applications, such as the detection of fake news or the automatic extraction of attitudes toward entities or events in the media. This paper describes a surprisingly simple and efficient classification approach to open stance classification in Twitter, for rumour and veracity classification. The approach profits from a novel set of automatically identifiable problem-specific features, which significantly boost classifier accuracy and achieve above state-of-the-art results on recent benchmark datasets. This calls into question the value of using complex sophisticated models for stance classification without first doing informed feature extraction
QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions
This paper describes the participation of the QMUL-SDS team for Task 1 of the
CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the
check-worthiness of tweets about COVID-19 to identify and prioritise tweets
that need fact-checking. The overarching aim is to further support ongoing
efforts to protect the public from fake news and help people find reliable
information. We describe and analyse the results of our submissions. We show
that a CNN using COVID-Twitter-BERT (CT-BERT) enhanced with numeric expressions
can effectively boost performance from baseline results. We also show results
of training data augmentation with rumours on other topics. Our best system
ranked fourth in the task with encouraging outcomes showing potential for
improved results in the future
COVID-19 and Arabic Twitter:How can Arab World Governments and Public Health Organizations Learn from Social Media?
In March 2020, the World Health Organization announced the COVID-19 outbreak as a pandemic. Most previous social media related research has been on English tweets and COVID-19. In this study, we collect approximately 1 million Arabic tweets from the Twitter streaming API related to COVID-19. Focussing on outcomes that we believe will be useful for Public Health Organizations, we analyse them in three different ways: identifying the topics discussed during the period, detecting rumours, and predicting the source of the tweets. We use the k-means algorithm for the first goal with k=5. The topics discussed can be grouped as follows: COVID-19 statistics, prayers for God, COVID-19 locations, advise and education for prevention, and advertising. We sample 2000 tweets and label them manually for false information, correct information, and unrelated. Then, we apply three different machine learning algorithms, Logistic Regression, Support Vector Classification, and Naïve Bayes with two sets of features, word frequency approach and word embeddings. We find that Machine Learning classifiers are able to correctly identify the rumour related tweets with 84% accuracy. We also try to predict the source of the rumour related tweets depending on our previous model which is about classifying tweets into five categories: academic, media, government, health professional, and public. Around (60%) of the rumour related tweets are classified as written by health professionals and academics
Rumor Stance Classification in Online Social Networks: A Survey on the State-of-the-Art, Prospects, and Future Challenges
The emergence of the Internet as a ubiquitous technology has facilitated the
rapid evolution of social media as the leading virtual platform for
communication, content sharing, and information dissemination. In spite of
revolutionizing the way news used to be delivered to people, this technology
has also brought along with itself inevitable demerits. One such drawback is
the spread of rumors facilitated by social media platforms which may provoke
doubt and fear upon people. Therefore, the need to debunk rumors before their
wide spread has become essential all the more. Over the years, many studies
have been conducted to develop effective rumor verification systems. One aspect
of such studies focuses on rumor stance classification, which concerns the task
of utilizing users' viewpoints about a rumorous post to better predict the
veracity of a rumor. Relying on users' stances in rumor verification task has
gained great importance, for it has shown significant improvements in the model
performances. In this paper, we conduct a comprehensive literature review on
rumor stance classification in complex social networks. In particular, we
present a thorough description of the approaches and mark the top performances.
Moreover, we introduce multiple datasets available for this purpose and
highlight their limitations. Finally, some challenges and future directions are
discussed to stimulate further relevant research efforts.Comment: 13 pages, 2 figures, journa
Will-they-won't-they: A very large dataset for stance detection on twitter
We present a new challenging stance detection dataset, called Will-They-Won’t-They (WT--WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.Keynes Fund, Cambridg
Context-Aware Message-Level Rumour Detection with Weak Supervision
Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced
Stance Classification for Rumour Analysis in Twitter: Exploiting Affective Information and Conversation Structure
Analysing how people react to rumours associated with news in social media is
an important task to prevent the spreading of misinformation, which is nowadays
widely recognized as a dangerous tendency. In social media conversations, users
show different stances and attitudes towards rumourous stories. Some users take
a definite stance, supporting or denying the rumour at issue, while others just
comment it, or ask for additional evidence related to the veracity of the
rumour. On this line, a new shared task has been proposed at SemEval-2017 (Task
8, SubTask A), which is focused on rumour stance classification in English
tweets. The goal is predicting user stance towards emerging rumours in Twitter,
in terms of supporting, denying, querying, or commenting the original rumour,
looking at the conversation threads originated by the rumour. This paper
describes a new approach to this task, where the use of conversation-based and
affective-based features, covering different facets of affect, has been
explored. Our classification model outperforms the best-performing systems for
stance classification at SemEval-2017 Task 8, showing the effectiveness of the
feature set proposed.Comment: To appear in Proceedings of the 2nd International Workshop on Rumours
and Deception in Social Media (RDSM), co-located with CIKM 2018, Turin,
Italy, October 201
- …