387 research outputs found
RiskTrack: a new approach for risk assessment of radicalisation based on social media data
Proceedings of the Workshop on Affective Computing and Context Awareness in Ambient Intelligence (AfCAI 2016)
Murcia, Spain, November 24-25, 2016The RiskTrack project aims to help in the prevention of terrorism
through the identi cation of online radicalisation. In line with the
European Union priorities in this matter, this project has been designed
to identify and tackle the indicators that raise a red
ag about which individuals
or communities are being radicalised and recruited to commit
violent acts of terrorism. Therefore, the main goals of this project will be
twofold: On the one hand, it is needed to identify the main features and
characteristics that can be used to evaluate a risk situation, to do that
a risk assessment methodology studying how to detect signs of radicalisation
(e.g., use of language, behavioural patterns in social networks...)
will be designed. On the other hand, these features will be tested and
analysed using advanced data mining methods, knowledge representation
(semantic and ontology engineering) and multilingual technologies. The
innovative aspect of this project is to not offer just a methodology on
risk assessment, but also a tool that is build based on this methodology,
so that the prosecutors, judges, law enforcement and other actors can
obtain a short term tangible results.This work has been supported by the RiskTrack project: "Tracking tool based on
social media for risk assessment on radicalisation" under the EU Justice Action
Grant: JUST-2015-JCOO-AG-72318
Modeling Islamist Extremist Communications on Social Media using Contextual Dimensions: Religion, Ideology, and Hate
Terror attacks have been linked in part to online extremist content. Although
tens of thousands of Islamist extremism supporters consume such content, they
are a small fraction relative to peaceful Muslims. The efforts to contain the
ever-evolving extremism on social media platforms have remained inadequate and
mostly ineffective. Divergent extremist and mainstream contexts challenge
machine interpretation, with a particular threat to the precision of
classification algorithms. Our context-aware computational approach to the
analysis of extremist content on Twitter breaks down this persuasion process
into building blocks that acknowledge inherent ambiguity and sparsity that
likely challenge both manual and automated classification. We model this
process using a combination of three contextual dimensions -- religion,
ideology, and hate -- each elucidating a degree of radicalization and
highlighting independent features to render them computationally accessible. We
utilize domain-specific knowledge resources for each of these contextual
dimensions such as Qur'an for religion, the books of extremist ideologues and
preachers for political ideology and a social media hate speech corpus for
hate. Our study makes three contributions to reliable analysis: (i) Development
of a computational approach rooted in the contextual dimensions of religion,
ideology, and hate that reflects strategies employed by online Islamist
extremist groups, (ii) An in-depth analysis of relevant tweet datasets with
respect to these dimensions to exclude likely mislabeled users, and (iii) A
framework for understanding online radicalization as a process to assist
counter-programming. Given the potentially significant social impact, we
evaluate the performance of our algorithms to minimize mislabeling, where our
approach outperforms a competitive baseline by 10.2% in precision.Comment: 22 page
A systematic survey of online data mining technology intended for law enforcement
As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies
Classification of radical web text using a composite-based method
The spread of terrorism and extremism activities on the Internet has created the need for intelligence gathering via Web and real-time monitoring of potential websites for extremist activities. However, the manual classification for such contents is practically difficult and time-consuming. In response to this challenge, an automated classification system called Composite technique was developed. This is a computational framework that explores the combination of both semantics and syntactic features of textual contents of a Web page. We implemented the framework on a set of extremist Web pages - a dataset that has been subjected to a manual classification process. Thereby, we developed a classification model on the data using the J48 decision algorithm, to generate a measure of how well each page can be classified into their appropriate classes. The classification result obtained from our method when compared with other states of the art, indicated a 96% success rate overall in classifying Web pages when matched against the manual classification
Semantic feature reduction and hybrid feature selection for clustering of Arabic Web pages
In the literature, high-dimensional data reduces the efficiency of clustering algorithms. Clustering the Arabic text is challenging because semantics of the text involves deep semantic processing. To overcome the problems, the feature selection and reduction methods have become essential to select and identify the appropriate features in reducing high-dimensional space. There is a need to develop a suitable design for feature selection and reduction methods that would result in a more relevant, meaningful and reduced representation of the Arabic texts to ease the clustering process. The research developed three different methods for analyzing the features of the Arabic Web text. The first method is based on hybrid feature selection that selects the informative term representation within the Arabic Web pages. It incorporates three different feature selection methods known as Chi-square, Mutual Information and Term Frequency–Inverse Document Frequency to build a hybrid model. The second method is a latent document vectorization method used to represent the documents as the probability distribution in the vector space. It overcomes the problems of high-dimension by reducing the dimensional space. To extract the best features, two document vectorizer methods have been implemented, known as the Bayesian vectorizer and semantic vectorizer. The third method is an Arabic semantic feature analysis used to improve the capability of the Arabic Web analysis. It ensures a good design for the clustering method to optimize clustering ability when analysing these Web pages. This is done by overcoming the problems of term representation, semantic modeling and dimensional reduction. Different experiments were carried out with k-means clustering on two different data sets. The methods provided solutions to reduce high-dimensional data and identify the semantic features shared between similar Arabic Web pages that are grouped together in one cluster. These pages were clustered according to the semantic similarities between them whereby they have a small Davies–Bouldin index and high accuracy. This study contributed to research in clustering algorithm by developing three methods to identify the most relevant features of the Arabic Web pages
- …