7 research outputs found

    Unified framework for spam detection and risk assessment in short message communication media / Adewole Kayode Sakariyah

    Get PDF
    Short message communication media (SMCM), such as mobile and microblogging social networks, have become essential part of many people daily routine. Despite the benefits offered by these communication media, they have become the popular platforms for distributing spam contents. Research in spam message and spam account detection in SMCM has received growing interests in the recent years, mainly focusing on introducing separate frameworks that can identify spam message or spam account. There are hundreds of published works related to spam message and spam account detection that aim to identify effective detection methods. While spam message and spam account studies have recently advanced, there are still areas available to explore, mostly with respect to introduction of unified method that can detect spam message and spam account within a single framework as well as identifying risk levels of spam accounts. Existing content-based methods for spam detection degraded in performance due to many factors. For instance, unlike contents posted on social networks like Facebook and Renren, SMS and microblogging messages have limited size composed using many domain-specific words such as idioms and abbreviations. In addition, microblogging messages are unstructured and noisy. These distinguished characteristics posed challenges to existing approaches for spam message detection. The state-of-the-art solutions for spam accounts detection have faced different evasion tactics in the hands of intelligent spammers. Thus, the need to investigate features, which can be used to identify spam message and spam account in SMCM. This study is concerned with introduction of a unified framework that can detect spam message and spam account as well as assessing account risk level. To achieve this aim, this study proposed a novel framework, which combines three models: Spam Account Detection Model (SADM), Spam Message Detection Model (SMDM), and Spam Risk Assessment Model (SRAM). Sixty-nine (69) set of features were identified from five main categories to develop the SADM. Additionally, eighteen (18) features were introduced to build the SMDM. The performance of ten (10) machine learning algorithms were evaluated to select the best classifier for both SADM and SMDM. Bio-inspired evolutionary search method was studied to identify the discriminating features for spam account detection. A model to estimate the levels of risk of spam accounts is established using Fuzzy Analytic Hierarchy Process. Four levels of risk were employed with their corresponding response strategies used to map risk levels into different types of response. To assess the performance of the proposed framework, an evaluation study with four stages was undertaken. With promising results being gathered, a proof-of-concept study was conducted using an online assessment mode to demonstrate the applicability of the proposed framework. Based on the results gathered, this study has demonstrated that the proposed framework can be used to detect spam message and spam account as well as assess the risk level of spam accounts in SMCM

    SMSAD: a framework for spam message and spam account detection

    No full text
    Short message communication media, such as mobile and microblogging social networks, have become attractive platforms for spammers to disseminate unsolicited contents. However, the traditional content-based methods for spam detection degraded in performance due to many factors. For instance, unlike the contents posted on social networks like Facebook and Renren, SMS and microblogging messages have limited size with the presence of many domain specific words, such as idioms and abbreviations. In addition, microblogging messages are very unstructured and noisy. These distinguished characteristics posed challenges to existing email spam detection models for effective spam identification in short message communication media. The state-of-the-art solutions for social spam accounts detection have faced different evasion tactics in the hands of intelligent spammers. In this paper, a unified framework is proposed for both spam message and spam account detection tasks. We utilized four datasets in this study, two of which are from SMS spam message domain and the remaining two from Twitter microblog. To identify a minimal number of features for spam account detection on Twitter, this paper studied bio-inspired evolutionary search method. Using evolutionary search algorithm, a compact model for spam account detection is proposed, which is incorporated in the machine learning phase of the unified framework. The results of the various experiments conducted indicate that the proposed framework is promising for detecting both spam message and spam account with a minimal number of features. © 2017, Springer Science+Business Media, LLC

    Twitter spam account detection based on clustering and classification methods

    No full text
    Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social network (OSN), acting as a microblogging OSN, and at the same time as a news update platform. Recently, the growth in Twitter social interactions has attracted the attention of cybercriminals. Spammers have used Twitter to spread malicious messages, post phishing links, flood the network with fake accounts, and engage in other malicious activities. The process of detecting the network of spammers who engage in these activities is an important step toward identifying individual spam account. Researchers have proposed a number of approaches to identify a group of spammers. However, each of these approaches addressed a specific category of spammer. This paper proposes a different approach to detect spammers on Twitter based on the similarities that exist among spam accounts. A number of features were introduced to improve the performance of the three classification algorithms selected in this study. The proposed approach applied principal component analysis and tuned K-means algorithm to cluster over 200,000 accounts, randomly selected from more than 2 million tweets to detect the clusters of spammers. Experimental results show that Random Forest achieved the highest accuracy of 96.30%. This result is followed by multilayer perceptron with 96.00% and support vector machine, which achieved 95.60%. The performance of the selected classifiers based on class imbalance also revealed that Random Forest achieved the highest accuracy, precision, recall, and F-measure

    Malicious URLs Detection Using Data Streaming Algorithms

    No full text
    As a result of the advancement in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices as well daily operations like credit card transactions and mobile phones. Data stream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model in which labelled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training samples to be readily accessible before the learning process. In this setting, the training procedure is mostly done in an offline environment owing to the high cost of training. Consequently, traditional batch learning technique suffers from some serious drawbacks, such as poor scalability for the real-time phishing websites detection, because the model mostly requires re-training from scratch using new training samples. Thus, this paper presents the application of streaming algorithms for detecting malicious URLs based on some selected online learners which include: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Hence, experimental results on two prominent phishing datasets showed that Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites
    corecore