581 research outputs found

    Integrated approach to detect spam in social media networks using hybrid features

    Get PDF
    Online social networking sites are becoming more popular amongst Internet users. The Internet users spend some amount of time on popular social networking sites like Facebook, Twitter and LinkedIn etc. Online social networks are considered to be much useful tool to the society used by Internet lovers to communicate and transmit information. These social networking platforms are useful to share information, opinions and ideas, make new friends, and create new friend groups. Social networking sites provide large amount of technical information to the users. This large amount of information in social networking sites attracts cyber criminals to misuse these sites information. These users create their own accounts and spread vulnerable information to the genuine users. This information may be advertising some product, send some malicious links etc to disturb the natural users on social sites. Spammer detection is a major problem now days in social networking sites. Previous spam detection techniques use different set of features to classify spam and non spam users. In this paper we proposed a hybrid approach which uses content based and user based features for identification of spam on Twitter network. In this hybrid approach we used decision tree induction algorithm and Bayesian network algorithm to construct a classification model. We have analysed the proposed technique on twitter dataset. Our analysis shows that our proposed methodology is better than some other existing techniques

    Cascading Randomized Weighted Majority: A New Online Ensemble Learning Algorithm

    Full text link
    With the increasing volume of data in the world, the best approach for learning from this data is to exploit an online learning algorithm. Online ensemble methods are online algorithms which take advantage of an ensemble of classifiers to predict labels of data. Prediction with expert advice is a well-studied problem in the online ensemble learning literature. The Weighted Majority algorithm and the randomized weighted majority (RWM) are the most well-known solutions to this problem, aiming to converge to the best expert. Since among some expert, the best one does not necessarily have the minimum error in all regions of data space, defining specific regions and converging to the best expert in each of these regions will lead to a better result. In this paper, we aim to resolve this defect of RWM algorithms by proposing a novel online ensemble algorithm to the problem of prediction with expert advice. We propose a cascading version of RWM to achieve not only better experimental results but also a better error bound for sufficiently large datasets.Comment: 15 pages, 3 figure

    A Comprehensive Survey of Data Mining-based Fraud Detection Research

    Full text link
    This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page

    An Effective Ensemble Approach for Spam Classification

    Get PDF
    The annoyance of spam increasingly plagues both individuals and organizations. Spam classification is an important issue to distinguish the spam with the legitimate email or address. This paper presents a neural network ensemble approach based on a specially designed cooperative coevolution paradigm. Each component network corresponds to a separate subpopulation and all subpopulations are evolved simultaneously. The ensemble performance and the Q-statistic diversity measure are adopted as the objectives, and the component networks are evaluated by using the multi-objective Pareto optimality measure. Experimental results illustrate that the proposed algorithm outperforms the traditional ensemble methods on the spam classification problems

    Spam Detection Using Machine Learning and Deep Learning

    Get PDF
    Text messages are essential these days; however, spam texts have contributed negatively to the success of this communication mode. The compromised authenticity of such messages has given rise to several security breaches. Using spam messages, malicious links have been sent to either harm the system or obtain information detrimental to the user. Spam SMS messages as well as emails have been used as media for attacks such as masquerading and smishing ( a phishing attack through text messaging), and this has threatened both the user and service providers. Therefore, given the waves of attacks, the need to identify and remove these spam messages is important. This dissertation explores the process of text classification from data input to embedded representation of the words in vector form and finally the classification process. Therefore, we have applied different embedding methods to capture both the linguistic and semantic meanings of words. Static embedding methods that are used include Word to Vector (Word2Vec) and Global Vectors (GloVe), while for dynamic embedding the transfer learning of the Bidirectional Encoder Representations from Transformers (BERT) was employed. For classification, both machine learning and deep learning techniques were used to build an efficient and sensitive classification model with good accuracy and low false positive rate. Our result established that the combination of BERT for embedding and machine learning for classification produced better classification results than other combinations. With these results, we developed models that combined the self-feature extraction advantage of deep learning and the effective classification of machine learning. These models were tested on four different datasets, namely: SMS Spam dataset, Ling dataset, Spam Assassin dataset and Enron dataset. BERT+SVC (hybrid model) produced the result with highest accuracy and lowest false positive rate

    CLASSIFICATION OF CYBERSECURITY INCIDENTS IN NIGERIA USING MACHINE LEARNING METHODS

    Get PDF
    Cybercrime has become more likely as a result of technological advancements and increased use of the internet and computer systems. As a result, there is an urgent need to develop effective methods of dealing with these cyber threats or incidents to identify and combat the associated cybercrimes in Nigerian cyberspace adequately. It is therefore desirable to build models that will enable the Nigeria Computer Emergency Response Team (ngCERT) and law enforcement agencies to gain valuable knowledge of insights from the available data to detect, identify and efficiently classify the most prevalent cyber incidents within Nigeria cyberspace, and predict future threats. This study applied machine learning methods to study and understand cybercrime incidents or threats recorded by ngCERT to build models that will characterize cybercrime incidents in Nigeria and classify cybersecurity incidents by mode of attacks and identify the most prevalent incidents within Nigerian cyberspace. Seven different machine learning methods were used to build the classification and prediction models. The Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Decision Tree (CART) and Random Forest (RF) Algorithms were used to discover the relationship between the relevant attributes of the datasets then classify the threats into several categories. The RF, CART, and KNN models were shown to be the most effective in classifying our data with accuracy score of 99%  each while others has accuracy scores of 98% for SVM, 89% for NB, 88% for LR, and 88% for LDA. Therefore, the result of our classification will help organizations in Nigeria to be able to understand the threats that could affect their assets

    CLASSIFICATION OF CYBERSECURITY INCIDENTS IN NIGERIA USING MACHINE LEARNING METHODS

    Get PDF
    Cybercrime has become more likely as a result of technological advancements and increased use of the internet and computer systems. As a result, there is an urgent need to develop effective methods of dealing with these cyber threats or incidents to identify and combat the associated cybercrimes in Nigerian cyberspace adequately. It is therefore desirable to build models that will enable the Nigeria Computer Emergency Response Team (ngCERT) and law enforcement agencies to gain valuable knowledge of insights from the available data to detect, identify and efficiently classify the most prevalent cyber incidents within Nigeria cyberspace, and predict future threats. This study applied machine learning methods to study and understand cybercrime incidents or threats recorded by ngCERT to build models that will characterize cybercrime incidents in Nigeria and classify cybersecurity incidents by mode of attacks and identify the most prevalent incidents within Nigerian cyberspace. Seven different machine learning methods were used to build the classification and prediction models. The Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Decision Tree (CART) and Random Forest (RF) Algorithms were used to discover the relationship between the relevant attributes of the datasets then classify the threats into several categories. The RF, CART, and KNN models were shown to be the most effective in classifying our data with accuracy score of 99%  each while others has accuracy scores of 98% for SVM, 89% for NB, 88% for LR, and 88% for LDA. Therefore, the result of our classification will help organizations in Nigeria to be able to understand the threats that could affect their assets

    Design of Multi-View Based Email Classification for IoT Systems via Semi-Supervised Learning

    Get PDF
    Suspicious emails are one big threat for Internet of Things (IoT) security, which aim to induce users to click and then redirect them to a phishing webpage. To protect IoT systems, email classification is an essential mechanism to classify spam and legitimate emails. In the literature, most email classification approaches adopt supervised learning algorithms that require a large number of labeled data for classifier training. However, data labeling is very time consuming and expensive, making only a very small set of data available in practice, which would greatly degrade the effectiveness of email classification. To mitigate this problem, in this work, we develop an email classification approach based on multi-view disagreement-based semi-supervised learning. The idea behind is that multi-view method can offer richer information for classification, which is often ignored by literature. The use of semi-supervised learning can help leverage both labeled and unlabeled data. In the evaluation, we investigate the performance of our proposed approach with datasets and in real network environments. Experimental results demonstrate that multi-view can achieve better classification performance than single view, and that our approach can achieve better performance as compared to the existing similar algorithms
    • …
    corecore