1,131 research outputs found

    Solutions to Detect and Analyze Online Radicalization : A Survey

    Full text link
    Online Radicalization (also called Cyber-Terrorism or Extremism or Cyber-Racism or Cyber- Hate) is widespread and has become a major and growing concern to the society, governments and law enforcement agencies around the world. Research shows that various platforms on the Internet (low barrier to publish content, allows anonymity, provides exposure to millions of users and a potential of a very quick and widespread diffusion of message) such as YouTube (a popular video sharing website), Twitter (an online micro-blogging service), Facebook (a popular social networking website), online discussion forums and blogosphere are being misused for malicious intent. Such platforms are being used to form hate groups, racist communities, spread extremist agenda, incite anger or violence, promote radicalization, recruit members and create virtual organi- zations and communities. Automatic detection of online radicalization is a technically challenging problem because of the vast amount of the data, unstructured and noisy user-generated content, dynamically changing content and adversary behavior. There are several solutions proposed in the literature aiming to combat and counter cyber-hate and cyber-extremism. In this survey, we review solutions to detect and analyze online radicalization. We review 40 papers published at 12 venues from June 2003 to November 2011. We present a novel classification scheme to classify these papers. We analyze these techniques, perform trend analysis, discuss limitations of existing techniques and find out research gaps

    Tweet-to-Act: Towards Tweet-Mining Framework for Extracting Terrorist Attack-related Information and Reporting

    Get PDF
    The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams, synthesizes event-specific information, e.g., a terrorist attack, and alerts law enforcement, emergency services, and media outlets. Specifically, the proposed framework, Tweet-to-Act (T2A), employs word embedding to transform tweets into a vector space model and then utilizes theWord Mover’s Distance (WMD) to cluster tweets for the identification of incidents. To extract reliable and valuable information from a large dataset of short and informal tweets, the proposed framework employs sequence labeling with bidirectional Long Short-Term Memory based Recurrent Neural Networks (bLSTM-RNN). Extensive experimental results suggest that our proposed framework, T2A, outperforms other state-of-the-art methods that use vector space modeling and distance calculation techniques, e.g., Euclidean and Cosine distance. T2A achieves an accuracy of 96% and an F1-score of 86.2% on real-life datasets

    Greedy methods for approximate graph matching with applications for social network analysis

    Get PDF
    In this thesis, we study greedy algorithms for approximate sub-graph matching with attributed graphs. Such algorithms find one or multiple copies of a sub-graph pattern from a bigger data graph through approximate matching. One intended application of sub-graph matching method is in Social Network Analysis for detecting potential terrorist groups from known terrorist activity patterns. We propose a new method for approximate sub-graph matching which utilizes degree information to reduce the search space within the incremental greedy search framework. In addition, we have introduced the notion of a “seed” in incremental greedy method that aims to find a good initial partial match. Simulated data based on terrorist profiles database is used in our experiments that compare the computational efficiency and matching accuracy of various methods. The experiment results suggest that with increasing size of the data graph, the efficiency advantage of degree-based method becomes more significant, while degree-based method remains as accurate as incremental greedy. Using a “seed” significantly improves matching accuracy (at the cost of decreased efficiency) when the attribute values in the graphs are deceptively noisy. We have also investigated a method that allows to expand a matched sub-graph from the data graph to include those nodes strongly connected to the current match

    Can we predict a riot? Disruptive event detection using Twitter

    Get PDF
    In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases

    ON META-NETWORKS, DEEP LEARNING, TIME AND JIHADISM

    Get PDF
    Il terrorismo di stampo jihadista rappresenta una minaccia per la società e una sfida per gli scienziati interessati a comprenderne la complessità. Questa complessità richiede costantemente nuovi sviluppi in termini di ricerca sul terrorismo. Migliorare la conoscenza empirica rispetto a tale fenomeno può potenzialmente contribuire a sviluppare applicazioni concrete e, in ultima istanza, a prevenire danni all’uomo. In considerazione di tali aspetti, questa tesi presenta un nuovo quadro metodologico che integra scienza delle reti, modelli stocastici e apprendimento profondo per far luce sul terrorismo jihadista sia a livello esplicativo che predittivo. In particolare, questo lavoro compara e analizza le organizzazioni jihadiste più attive a livello mondiale (ovvero lo Stato Islamico, i Talebani, Al Qaeda, Boko Haram e Al Shabaab) per studiarne i pattern comportamentali e predirne le future azioni. Attraverso un impianto teorico che si poggia sulla concentrazione spaziale del crimine e sulle prospettive strategiche del comportamento terroristico, questa tesi persegue tre obiettivi collegati utilizzando altrettante tecniche ibride. In primo luogo, verrà esplorata la complessità operativa delle organizzazioni jihadiste attraverso l’analisi di matrici stocastiche di transizione e verrà presentato un nuovo coefficiente, denominato “Normalized Transition Similarity”, che misura la somiglianza fra paia di gruppi in termini di dinamiche operative. In secondo luogo, i processi stocastici di Hawkes aiuteranno a testare la presenza di meccanismi di dipendenza temporale all’interno delle più comuni sotto-sequenze strategiche di ciascun gruppo. Infine, il framework integrerà la meta-reti complesse e l’apprendimento profondo per classificare e prevedere i target a maggiore rischio di essere colpiti dalle organizzazioni jihadiste durante i loro futuri attacchi. Per quanto riguarda i risultati, le matrici stocastiche di transizione mostrano che i gruppi terroristici possiedono un ricco e complesso repertorio di combinazioni in termini di armi e obiettivi. Inoltre, i processi di Hawkes indicano la presenza di diffusa self-excitability nelle sequenze di eventi. Infine, i modelli predittivi che sfruttano la flessibilità delle serie temporali derivanti da grafi dinamici e le reti neurali Long Short-Term Memory forniscono risultati promettenti rispetto ai target più a rischio. Nel complesso, questo lavoro ambisce a dimostrare come connessioni astratte e nascoste fra eventi possano essere fondamentali nel rivelare le meccaniche del comportamento jihadista e come processi memory-like (ovvero molteplici comportamenti ricorrenti, interconnessi e non randomici) possano risultare estremamente utili nel comprendere le modalità attraverso cui tali organizzazioni operano.Jihadist terrorism represents a global threat for societies and a challenge for scientists interested in understanding its complexity. This complexity continuously calls for developments in terrorism research. Enhancing the empirical knowledge on the phenomenon can potentially contribute to developing concrete real-world applications and, ultimately, to the prevention of societal damages. In light of these aspects, this work presents a novel methodological framework that integrates network science, mathematical modeling, and deep learning to shed light on jihadism, both at the explanatory and predictive levels. Specifically, this dissertation will compare and analyze the world's most active jihadist terrorist organizations (i.e. The Islamic State, the Taliban, Al Qaeda, Boko Haram, and Al Shabaab) to investigate their behavioral patterns and forecast their future actions. Building upon a theoretical framework that relies on the spatial concentration of terrorist violence and the strategic perspective of terrorist behavior, this dissertation will pursue three linked tasks, employing as many hybrid techniques. Firstly, explore the operational complexity of jihadist organizations using stochastic transition matrices and present Normalized Transition Similarity, a novel coefficient of pairwise similarity in terms of strategic behavior. Secondly, investigate the presence of time-dependent dynamics in attack sequences using Hawkes point processes. Thirdly, integrate complex meta-networks and deep learning to rank and forecast most probable future targets attacked by the jihadist groups. Concerning the results, stochastic transition matrices show that terrorist groups possess a complex repertoire of combinations in the use of weapons and targets. Furthermore, Hawkes models indicate the diffused presence of self-excitability in attack sequences. Finally, forecasting models that exploit the flexibility of graph-derived time series and Long Short-Term Memory networks provide promising results in terms of correct predictions of most likely terrorist targets. Overall, this research seeks to reveal how hidden abstract connections between events can be exploited to unveil jihadist mechanics and how memory-like processes (i.e. multiple non-random parallel and interconnected recurrent behaviors) might illuminate the way in which these groups act

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Latent Semantic Indexing (LSI) Based Distributed System and Search On Encrypted Data

    Get PDF
    Latent semantic indexing (LSI) was initially introduced to overcome the issues of synonymy and polysemy of the traditional vector space model (VSM). LSI, however, has challenges of its own, mainly scalability. Despite being introduced in 1990, there are few attempts that provide an efficient solution for LSI, most of the literature is focuses on LSI’s applications rather than improving the original algorithm. In this work we analyze the first framework to provide scalable implementation of LSI and report its performance on the distributed environment of RAAD. The possibility of adopting LSI in the field of searching over encrypted data is also investigated. The importance of that field is stemmed from the need for cloud computing as an effective computing paradigm that provides an affordable access to high computational power. Encryption is usually applied to prevent unauthorized access to the data (the host is assumed to be curious), however this limits accessibility to the data given that search over encryption is yet to catch with the latest techniques adopted by the Information Retrieval (IR) community. In this work we propose a system that uses LSI for indexing and free-query text for retrieving. The results show that the available LSI framework does scale on large datasets, however it had some limitations with respect to factors like dictionary size and memory limit. When replicating the exact settings of the baseline on RAAD, it performed relatively slower. This could be resulted by the fact that RAAD uses a distributed file system or because of network latency. The results also show that the proposed system for applying LSI on encrypted data retrieved documents in the same order as the baseline (unencrypted data)

    Link Prediction in Complex Networks: A Survey

    Full text link
    Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labelled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.Comment: 44 pages, 5 figure
    • …
    corecore