11 research outputs found

    A weakly supervised Bayesian model for violence detection in social media

    Get PDF
    Social streams have proven to be the most up-to-date and inclusive information on current events. In this paper we propose a novel probabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the incorporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the intuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared to a few competitive baselines

    Linked knowledge sources for topic classification of microposts: a semantic graph-based approach

    Get PDF
    Short text messages, a.k.a microposts (e.g., tweets), have proven to be an effective channel for revealing information about trends and events, ranging from those related to disaster (e.g., Hurricane Sandy) to those related to violence (e.g., Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of microposts with features extracted only from the microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures

    Understanding the Roots of Radicalisation on Twitter

    Get PDF
    In an increasingly digital world, identifying signs of online extremism sits at the top of the priority list for counter-extremist agencies. Researchers and governments are investing in the creation of advanced information technologies to identify and counter extremism through intelligent large-scale analysis of online data. However, to the best of our knowledge, these technologies are neither based on, nor do they take advantage of, the existing theories and studies of radicalisation. In this paper we propose a computational approach for detecting and predicting the radicalisation influence a user is exposed to, grounded on the notion of ’roots of radicalisation’ from social science models. This approach has been applied to analyse and compare the radicalisation level of 112 pro-ISIS vs.112 “general" Twitter users. Our results show the effectiveness of our proposed algorithms in detecting and predicting radicalisation influence, obtaining up to 0.9 F-1 measure for detection and between 0.7 and 0.8 precision for prediction. While this is an initial attempt towards the effective combination of social and computational perspectives, more work is needed to bridge these disciplines, and to build on their strengths to target the problem of online radicalisation

    Making sense of microposts (#MSM2013) concept extraction challenge

    Get PDF
    Microposts are small fragments of social media content that have been published using a lightweight paradigm (e.g. Tweets, Facebook likes, foursquare check-ins). Microposts have been used for a variety of applications (e.g., sentiment analysis, opinion mining, trend analysis), by gleaning useful information, often using third-party concept extraction tools. There has been very large uptake of such tools in the last few years, along with the creation and adoption of new methods for concept extraction. However, the evaluation of such efforts has been largely consigned to document corpora (e.g. news articles), questioning the suitability of concept extraction tools and methods for Micropost data. This report describes the Making Sense of Microposts Workshop (#MSM2013) Concept Extraction Challenge, hosted in conjunction with the 2013 World Wide Web conference (WWW'13). The Challenge dataset comprised a manually annotated training corpus of Microposts and an unlabelled test corpus. Participants were set the task of engineering a concept extraction system for a defined set of concepts. Out of a total of 22 complete submissions 13 were accepted for presentation at the workshop; the submissions covered methods ranging from sequence mining algorithms for attribute extraction to part-of-speech tagging for Micropost cleaning and rule-based and discriminative models for token classification. In this report we describe the evaluation process and explain the performance of different approaches in different contexts

    What changed your mind : the roles of dynamic topics and discourse in argumentation process

    Get PDF
    In our world with full of uncertainty, debates and argumentation contribute to the progress of science and society. Despite of the in- creasing attention to characterize human arguments, most progress made so far focus on the debate outcome, largely ignoring the dynamic patterns in argumentation processes. This paper presents a study that automatically analyzes the key factors in argument persuasiveness, beyond simply predicting who will persuade whom. Specifically, we propose a novel neural model that is able to dynamically track the changes of latent topics and discourse in argumentative conversations, allowing the investigation of their roles in influencing the outcomes of persuasion. Extensive experiments have been conducted on argumentative conversations on both social media and supreme court. The results show that our model outperforms state-of-the-art models in identifying persuasive arguments via explicitly exploring dynamic factors of topic and discourse. We further analyze the effects of topics and discourse on persuasiveness, and find that they are both useful -- topics provide concrete evidence while superior discourse styles may bias participants, especially in social media arguments. In addition, we draw some findings from our empirical results, which will help people better engage in future persuasive conversations

    Representing, proving and sharing trustworthiness of web resources using veracity

    No full text
    The World Wide Web has evolved into a distributed network of web applications facilitating the publication of information on a large scale. Judging whether such information can be trusted is a difficult task for humans, often leading to blind trust. In this paper we present a model and the corresponding veracity ontology which allows trust to be placed in web content by web agents. Our approach differs from current work by allowing the trustworthiness of web content to be securely distributed across arbitrary domains and asserted through the provision of machine-readable proofs (i.e. by citing another piece of information, or stating the credentials of the user/agent). We provide a detailed scenario as motivation for our work and demonstrate how the ontology can be used
    corecore