10,845 research outputs found

    A Retrospective Analysis of the Fake News Challenge Stance Detection Task

    Full text link
    The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance classification task as a crucial first step towards detecting fake news. To date, there is no in-depth analysis paper to critically discuss FNC-1's experimental setup, reproduce the results, and draw conclusions for next-generation stance classification methods. In this paper, we provide such an in-depth analysis for the three top-performing systems. We first find that FNC-1's proposed evaluation metric favors the majority class, which can be easily classified, and thus overestimates the true discriminative power of the methods. Therefore, we propose a new F1-based metric yielding a changed system ranking. Next, we compare the features and architectures used, which leads to a novel feature-rich stacked LSTM model that performs on par with the best systems, but is superior in predicting minority classes. To understand the methods' ability to generalize, we derive a new dataset and perform both in-domain and cross-domain experiments. Our qualitative and quantitative study helps interpreting the original FNC-1 scores and understand which features help improving performance and why. Our new dataset and all source code used during the reproduction study are publicly available for future research

    Topic-Specific Sentiment Analysis Can Help Identify Political Ideology

    Get PDF
    Ideological leanings of an individual can often be gauged by the sentiment one expresses about different issues. We propose a simple framework that represents a political ideology as a distribution of sentiment polarities towards a set of topics. This representation can then be used to detect ideological leanings of documents (speeches, news articles, etc.) based on the sentiments expressed towards different topics. Experiments performed using a widely used dataset show the promise of our proposed approach that achieves comparable performance to other methods despite being much simpler and more interpretable.Comment: Presented at EMNLP Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, 201

    Understanding the Roots of Radicalisation on Twitter

    Get PDF
    In an increasingly digital world, identifying signs of online extremism sits at the top of the priority list for counter-extremist agencies. Researchers and governments are investing in the creation of advanced information technologies to identify and counter extremism through intelligent large-scale analysis of online data. However, to the best of our knowledge, these technologies are neither based on, nor do they take advantage of, the existing theories and studies of radicalisation. In this paper we propose a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of ’roots of radicalisation’ from social science models. This approach has been applied to analyse and compare the radicalisation level of 112 pro-ISIS vs.112 “general" Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation

    Topology Analysis of International Networks Based on Debates in the United Nations

    Get PDF
    In complex, high dimensional and unstructured data it is often difficult to extract meaningful patterns. This is especially the case when dealing with textual data. Recent studies in machine learning, information theory and network science have developed several novel instruments to extract the semantics of unstructured data, and harness it to build a network of relations. Such approaches serve as an efficient tool for dimensionality reduction and pattern detection. This paper applies semantic network science to extract ideological proximity in the international arena, by focusing on the data from General Debates in the UN General Assembly on the topics of high salience to international community. UN General Debate corpus (UNGDC) covers all high-level debates in the UN General Assembly from 1970 to 2014, covering all UN member states. The research proceeds in three main steps. First, Latent Dirichlet Allocation (LDA) is used to extract the topics of the UN speeches, and therefore semantic information. Each country is then assigned a vector specifying the exposure to each of the topics identified. This intermediate output is then used in to construct a network of countries based on information theoretical metrics where the links capture similar vectorial patterns in the topic distributions. Topology of the networks is then analyzed through network properties like density, path length and clustering. Finally, we identify specific topological features of our networks using the map equation framework to detect communities in our networks of countries
    • …
    corecore