10,845 research outputs found
A Retrospective Analysis of the Fake News Challenge Stance Detection Task
The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance
classification task as a crucial first step towards detecting fake news. To
date, there is no in-depth analysis paper to critically discuss FNC-1's
experimental setup, reproduce the results, and draw conclusions for
next-generation stance classification methods. In this paper, we provide such
an in-depth analysis for the three top-performing systems. We first find that
FNC-1's proposed evaluation metric favors the majority class, which can be
easily classified, and thus overestimates the true discriminative power of the
methods. Therefore, we propose a new F1-based metric yielding a changed system
ranking. Next, we compare the features and architectures used, which leads to a
novel feature-rich stacked LSTM model that performs on par with the best
systems, but is superior in predicting minority classes. To understand the
methods' ability to generalize, we derive a new dataset and perform both
in-domain and cross-domain experiments. Our qualitative and quantitative study
helps interpreting the original FNC-1 scores and understand which features help
improving performance and why. Our new dataset and all source code used during
the reproduction study are publicly available for future research
Topic-Specific Sentiment Analysis Can Help Identify Political Ideology
Ideological leanings of an individual can often be gauged by the sentiment
one expresses about different issues. We propose a simple framework that
represents a political ideology as a distribution of sentiment polarities
towards a set of topics. This representation can then be used to detect
ideological leanings of documents (speeches, news articles, etc.) based on the
sentiments expressed towards different topics. Experiments performed using a
widely used dataset show the promise of our proposed approach that achieves
comparable performance to other methods despite being much simpler and more
interpretable.Comment: Presented at EMNLP Workshop on Computational Approaches to
Subjectivity, Sentiment & Social Media Analysis, 201
Understanding the Roots of Radicalisation on Twitter
In an increasingly digital world, identifying signs of online extremism sits at the top of the priority list for counter-extremist agencies. Researchers and governments are investing in the creation of advanced information technologies to identify and counter extremism through intelligent large-scale analysis of online data. However, to the best of our knowledge, these technologies are neither based on, nor do they take advantage of, the existing theories and studies of radicalisation. In this paper we propose a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of ’roots of radicalisation’ from social science models. This approach has been applied to analyse and compare the radicalisation level of 112 pro-ISIS vs.112 “general" Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation
Topology Analysis of International Networks Based on Debates in the United Nations
In complex, high dimensional and unstructured data it is often difficult to
extract meaningful patterns. This is especially the case when dealing with
textual data. Recent studies in machine learning, information theory and
network science have developed several novel instruments to extract the
semantics of unstructured data, and harness it to build a network of relations.
Such approaches serve as an efficient tool for dimensionality reduction and
pattern detection. This paper applies semantic network science to extract
ideological proximity in the international arena, by focusing on the data from
General Debates in the UN General Assembly on the topics of high salience to
international community. UN General Debate corpus (UNGDC) covers all high-level
debates in the UN General Assembly from 1970 to 2014, covering all UN member
states. The research proceeds in three main steps. First, Latent Dirichlet
Allocation (LDA) is used to extract the topics of the UN speeches, and
therefore semantic information. Each country is then assigned a vector
specifying the exposure to each of the topics identified. This intermediate
output is then used in to construct a network of countries based on information
theoretical metrics where the links capture similar vectorial patterns in the
topic distributions. Topology of the networks is then analyzed through network
properties like density, path length and clustering. Finally, we identify
specific topological features of our networks using the map equation framework
to detect communities in our networks of countries
- …