19 research outputs found

    Cloud-based textual analysis as a basis for document classification

    Get PDF
    Growing trends in data mining and developments in machine learning, have encouraged interest in analytical techniques that can contribute insights on data characteristics. The present paper describes an approach to textual analysis that generates extensive quantitative data on target documents, with output including frequency data on tokens, types, parts-of-speech and word n-grams. These analytical results enrich the available source data and have proven useful in several contexts as a basis for automating manual classification tasks. In the following, we introduce the Posit textual analysis toolset and detail its use in data enrichment as input to supervised learning tasks, including automating the identification of extremist Web content. Next, we describe the extension of this approach to Arabic language. Thereafter, we recount the move of these analytical facilities from local operation to a Cloud-based service. This transition, affords easy remote access for other researchers seeking to explore the application of such data enrichment to their own text-based data sets

    TENSOR: retrieval and analysis of heterogeneous online content for terrorist activity recognition

    Get PDF
    The proliferation of terrorist generated content online is a cause for concern as it goes together with the rise of radicalisation and violent extremism. Law enforcement agencies (LEAs) need powerful platforms to help stem the influence of such content. This article showcases the TENSOR project which focusses on the early detection of online terrorist activities, radicalisation and recruitment. Operating under the H2020 Secure Societies Challenge, TENSOR aims to develop a terrorism intelligence platform for increasing the ability of LEAs to identify, gather and analyse terrorism-related online content. The mechanisms to tackle this challenge by bringing together LEAs, industry, research, and legal experts are presented

    Towards Designing a Multipurpose Cybercrime Intelligence Framework

    Get PDF
    With the wide spread of the Internet and the increasing popularity of social networks that provide prompt and ease of communication, several criminal and radical groups have adopted it as a medium of operation. Existing literature in the area of cybercrime intelligence focuses on several research questions and adopts multiple methods using techniques such as social network analysis to address them. In this paper, we study the broad state-of-the-art research in cybercrime intelligence in order to identify existing research gaps. Our core aim is designing and developing a multipurpose framework that is able to fill these gaps using a wide range of techniques. We present an outline of a framework designed to aid law enforcement in detecting, analysing and making sense out of cybercrime data

    Alone together: Exploring community on an incel forum

    Get PDF
    Incels, or involuntary celibates, are men who are angry and frustrated at their inability to find sexual or intimate partners. This anger has repeatedly resulted in violence against women. Because incels are a relatively new phenomenon, there are many gaps in our knowledge, including how, and to what extent, incel forums function as online communities. The current study begins to fill this lacuna by qualitatively analyzing the incels.co forum to understand how community is created through online discourse. Both inductive and deductive thematic analyses were conducted on 17 threads (3400 posts). The results confirm that the incels.co forum functions as a community. Four themes in relation to community were found: The incel brotherhood; We can disagree, but you’re wrong; We are all coping here; and Will the real incel come forward. The four themes elucidate that incels most often exchange informational and emotional support

    Semantic feature reduction and hybrid feature selection for clustering of Arabic Web pages

    Get PDF
    In the literature, high-dimensional data reduces the efficiency of clustering algorithms. Clustering the Arabic text is challenging because semantics of the text involves deep semantic processing. To overcome the problems, the feature selection and reduction methods have become essential to select and identify the appropriate features in reducing high-dimensional space. There is a need to develop a suitable design for feature selection and reduction methods that would result in a more relevant, meaningful and reduced representation of the Arabic texts to ease the clustering process. The research developed three different methods for analyzing the features of the Arabic Web text. The first method is based on hybrid feature selection that selects the informative term representation within the Arabic Web pages. It incorporates three different feature selection methods known as Chi-square, Mutual Information and Term Frequency–Inverse Document Frequency to build a hybrid model. The second method is a latent document vectorization method used to represent the documents as the probability distribution in the vector space. It overcomes the problems of high-dimension by reducing the dimensional space. To extract the best features, two document vectorizer methods have been implemented, known as the Bayesian vectorizer and semantic vectorizer. The third method is an Arabic semantic feature analysis used to improve the capability of the Arabic Web analysis. It ensures a good design for the clustering method to optimize clustering ability when analysing these Web pages. This is done by overcoming the problems of term representation, semantic modeling and dimensional reduction. Different experiments were carried out with k-means clustering on two different data sets. The methods provided solutions to reduce high-dimensional data and identify the semantic features shared between similar Arabic Web pages that are grouped together in one cluster. These pages were clustered according to the semantic similarities between them whereby they have a small Davies–Bouldin index and high accuracy. This study contributed to research in clustering algorithm by developing three methods to identify the most relevant features of the Arabic Web pages

    Leading ethical leaders : higher education institutions, business schools and the sustainable development goals

    Get PDF
    This volume provides unique and profound insights from within educational institutions in diverse regions of the world on how ‘learning outside’ and ‘learning inside’ can be holistically integrated, so that the sustainable development agenda does not remain static and programmatic, but a creative and permeable framework. The shared hope across the thirteen chapters, which constitute complete original essays on the theme, is to develop meaningful, interdisciplinary curricula and research projects which serve the human community as a whole. The aim of the editors is directed towards a similar United Nations’ valuable ideal: to advance knowledge in respect of the earth and the future generations who will inherit it

    An integrated semantic-based framework for intelligent similarity measurement and clustering of microblogging posts

    Get PDF
    Twitter, the most popular microblogging platform, is gaining rapid prominence as a source of information sharing and social awareness due to its popularity and massive user generated content. These include applications such as tailoring advertisement campaigns, event detection, trends analysis, and prediction of micro-populations. The aforementioned applications are generally conducted through cluster analysis of tweets to generate a more concise and organized representation of the massive raw tweets. However, current approaches perform traditional cluster analysis using conventional proximity measures, such as Euclidean distance. However, the sheer volume, noise, and dynamism of Twitter, impose challenges that hinder the efficacy of traditional clustering algorithms in detecting meaningful clusters within microblogging posts. The research presented in this thesis sets out to design and develop a novel short text semantic similarity (STSS) measure, named TREASURE, which captures the semantic and structural features of microblogging posts for intelligently predicting the similarities. TREASURE is utilised in the development of an innovative semantic-based cluster analysis algorithm (SBCA) that contributes in generating more accurate and meaningful granularities within microblogging posts. The integrated semantic-based framework incorporating TREASURE and the SBCA algorithm tackles both the problem of microblogging cluster analysis and contributes to the success of a variety of natural language processing (NLP) and computational intelligence research. TREASURE utilises word embedding neural network (NN) models to capture the semantic relationships between words based on their co-occurrences in a corpus. Moreover, TREASURE analyses the morphological and lexical structure of tweets to predict the syntactic similarities. An intrinsic evaluation of TREASURE was performed with reference to a reliable similarity benchmark generated through an experiment to gather human ratings on a Twitter political dataset. A further evaluation was performed with reference to the SemEval-2014 similarity benchmark in order to validate the generalizability of TREASURE. The intrinsic evaluation and statistical analysis demonstrated a strong positive linear correlation between TREASURE and human ratings for both benchmarks. Furthermore, TREASURE achieved a significantly higher correlation coefficient compared to existing state-of-the-art STSS measures. The SBCA algorithm incorporates TREASURE as the proximity measure. Unlike conventional partition-based clustering algorithms, the SBCA algorithm is fully unsupervised and dynamically determine the number of clusters beforehand. Subjective evaluation criteria were employed to evaluate the SBCA algorithm with reference to the SemEval-2014 similarity benchmark. Furthermore, an experiment was conducted to produce a reliable multi-class benchmark on the European Referendum political domain, which was also utilised to evaluate the SBCA algorithm. The evaluation results provide evidence that the SBCA algorithm undertakes highly accurate combining and separation decisions and can generate pure clusters from microblogging posts. The contributions of this thesis to knowledge are mainly demonstrated as: 1) Development of a novel STSS measure for microblogging posts (TREASURE). 2) Development of a new SBCA algorithm that incorporates TREASURE to detect semantic themes in microblogs. 3) Generating a word embedding pre-trained model learned from a large corpus of political tweets. 4) Production of a reliable similarity-annotated benchmark and a reliable multi-class benchmark in the domain of politics

    Framing Covid-19: how fact-checking circulate on the Facebook far-right

    Full text link
    This research focus on how fact-checking links circulate on Facebook groups/pages that also shared disinformation, particularly, the ones affiliated with the far-right. Through a three-step method that included content analysis, discursive analysis and social network analysis, we analyzed public 860 posts and found out that: while fact-checking does circulate on these groups, they tend to be framed as disinformation through posts on far-right ones, which we call “explicit framing”; the far-right groups tend to cluster around specific fact-checking links that are mostly shared without a framing text, but whose theme support their own ideological narrative (which we call “silent framing”) and; both explicit and silent framing tend to happen through populist discourse connections

    EU Bibliography

    Get PDF
    A list of bibliographic references to selected articles in the field of European law and policy. This issue covers items from a wide range of academic and specialised periodicals published from November 2018 to October 2019. References are presented in the 19 subject headings covering all activities of the European Union
    corecore