Search CORE

313 research outputs found

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Twitter Bots’ Detection with Benford’s Law and Machine Learning

Author: Bhosale Sanmesh
Di Troia Fabio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Online Social Networks (OSNs) have grown exponentially in terms of active users and have now become an influential factor in the formation of public opinions. For this reason, the use of bots and botnets for spreading misinformation on OSNs has become a widespread concern. Identifying bots and botnets on Twitter can require complex statistical methods to score a profile based on multiple features. Benford’s Law, or the Law of Anomalous Numbers, states that, in any naturally occurring sequence of numbers, the First Significant Leading Digit (FSLD) frequency follows a particular pattern such that they are unevenly distributed and reducing. This principle can be applied to the first-degree egocentric network of a Twitter profile to assess its conformity to such law and, thus, classify it as a bot profile or normal profile. This paper focuses on leveraging Benford’s Law in combination with various Machine Learning (ML) classifiers to identify bot profiles on Twitter. In addition, a comparison with other statistical methods is produced to confirm our classification results

SJSU ScholarWorks

Impact of Location Spoofing Attacks on Performance Prediction in Mobile Networks

Author: Chang Sang Yoon
Kanuri Nikhil Sai
Kim Jinoh
Kim Jonghyun
Park Younghee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Performance prediction in wireless mobile networks is essential for diverse purposes in network management and operation. Particularly, the position of mobile devices is crucial to estimating the performance in the mobile communication setting. With its importance, this paper investigates mobile communication performance based on the coordinate information of mobile devices. We analyze a recent 5G data collection and examine the feasibility of location-based performance prediction. As location information is key to performance prediction, the basic assumption of making a relevant prediction is the correctness of the coordinate information of devices given. With its criticality, this paper also investigates the impact of position falsification on the ML-based performance predictor, which reveals the significant degradation of the prediction performance under such attacks, suggesting the need for effective defense mechanisms against location spoofing threats

SJSU ScholarWorks

A Blockchain-Based Retribution Mechanism for Collaborative Intrusion Detection

Author: Chang Sang Yoon
Fan Wenjun
Kumar Shubham
Park Younghee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Collaborative intrusion detection approach uses the shared detection signature between the collaborative participants to facilitate coordinated defense. In the context of collaborative intrusion detection system (CIDS), however, there is no research focusing on the efficiency of the shared detection signature. The inefficient detection signature costs not only the IDS resource but also the process of the peer-to-peer (P2P) network. In this paper, we therefore propose a blockchain-based retribution mechanism, which aims to incentivize the participants to contribute to verifying the efficiency of the detection signature in terms of certain distributed consensus. We implement a prototype using Ethereum blockchain, which instantiates a token-based retribution mechanism and a smart contract-enabled voting-based distributed consensus. We conduct a number of experiments built on the prototype, and the experimental results demonstrate the effectiveness of the proposed approach

SJSU ScholarWorks

Word Embeddings for Fake Malware Generation

Author: Di Troia Fabio
Tran Quang Duy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost of the time

SJSU ScholarWorks

A Blockchain-Based Tamper-Resistant Logging Framework

Author: Austin Thomas H.
Di Troia Fabio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Since its introduction in Bitcoin, the blockchain has proven to be a versatile data structure. In its role as an immutable ledger, it has grown beyond its initial use in financial transactions to be used in recording a wide variety of other useful information. In this paper, we explore the application of the blockchain outside of its traditional decentralized, financial domain. We show how, even with only a single “mining” node, a proof-of-work blockchain can be the cornerstone of a tamper resistant logging framework. By attaching a proof-of-work to blocks of logging messages, we make it increasingly difficult for an attacker to modify those logs even after totally compromising the system. Furthermore, we discuss various strategies an attacker might take to modify the logs without detection and show how effective those evasion techniques are against statistical analysis

SJSU ScholarWorks

Robustness of Image-Based Malware Analysis

Author: Di Troia Fabio
Stamp Mark
Tran Katrina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

In previous work, “gist descriptor” features extracted from images have been used in malware classification problems and have shown promising results. In this research, we determine whether gist descriptors are robust with respect to malware obfuscation techniques, as compared to Convolutional Neural Networks (CNN) trained directly on malware images. Using the Python Image Library (PIL), we create images from malware executables and from malware that we obfuscate. We conduct experiments to compare classifying these images with a CNN as opposed to extracting the gist descriptor features from these images to use in classification. For the gist descriptors, we consider a variety of classification algorithms including k-nearest neighbors, random forest, support vector machine, and multi-layer perceptron. We find that gist descriptors are more robust than CNNs, with respect to the obfuscation techniques that we consider

SJSU ScholarWorks

Recommender Systems for Online and Mobile Social Networks: A survey

Author: Campana Mattia Giovanni
Delmastro Franca
Publication venue: 'Elsevier BV'
Publication date: 28/06/2023
Field of study

Recommender Systems (RS) currently represent a fundamental tool in online services, especially with the advent of Online Social Networks (OSN). In this case, users generate huge amounts of contents and they can be quickly overloaded by useless information. At the same time, social media represent an important source of information to characterize contents and users' interests. RS can exploit this information to further personalize suggestions and improve the recommendation process. In this paper we present a survey of Recommender Systems designed and implemented for Online and Mobile Social Networks, highlighting how the use of social context information improves the recommendation task, and how standard algorithms must be enhanced and optimized to run in a fully distributed environment, as opportunistic networks. We describe advantages and drawbacks of these systems in terms of algorithms, target domains, evaluation metrics and performance evaluations. Eventually, we present some open research challenges in this area

arXiv.org e-Print Archive

An Artistic Perspective on Distributed Computer Networks. Creativity in Human-Machine Systems

Author: Gapsevicius Mindaugas
Publication venue: Goldsmiths, University of London
Publication date
Field of study

This thesis is written from an artistic perspective as a reflection on currently significant discussions in media theory, with a focus on the impact of technology on society. While mapping boundaries of contemporary art, post-digital art is considered the best for describing current discourses in media theory in the context of this research. Bringing into the discussion artworks by Martin Howse & Jonathan Kemp (2001-2008), Maurizio Bolognini (Bolognini 1988-present), and myself (mi_ga 2006), among many others, this research defines post-digital art, which in turn defines a complexity of interactions between elements of different natures, such as the living and non-living, human and machine, art and science. Within the analysis of P2P networks, I highlight Milgram's (1967) idea of six degrees of separation, which, at least from a speculative point of view, is interesting for the implementation of human-machine concepts in future technological developments. From this perspective, I argue that computer networks could, in the future, have more potential for merging with society if developed similarly to the computer routing scheme implemented in the Freenet distributed information storage and retrieval system. The thesis then describes my own artwork, 0.30402944246776265, including two newly developed plugins for the Freenet storage system; the first plugin is constructed to fulfill the idea of interacting elements of different natures (in this case, the WWW and Freenet), while the other plugin attempts to visualize data flow within the Freenet storage and retrieval system. All together, this paper proposes that a reconsideration of distributed and self-organized information systems, through an artistic and philosophical lens, can open up a space for the rethinking of the current integration of society and technology

Goldsmiths Research Online