955 research outputs found

    A Survey of Existing E-mail Spam Filtering Methods Considering Machine Learning Techniques

    Get PDF
    E-mail is one of the most secure medium for online communication and transferring data or messages through the web. An overgrowing increase in popularity, the number of unsolicited data has also increased rapidly. To filtering data, different approaches exist which automatically detect and remove these untenable messages. There are several numbers of email spam filtering technique such as Knowledge-based technique, Clustering techniques, Learningbased technique, Heuristic processes and so on. This paper illustrates a survey of different existing email spam filtering system regarding Machine Learning Technique (MLT) such as Naive Bayes, SVM, K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and rules. However, here we present the classification, evaluation and comparison of different email spam filtering system and summarize the overall scenario regarding accuracy rate of different existing approache

    Support Vector Machine Algorithm for SMS Spam Classification in The Telecommunication Industry

    Get PDF
    In recent years, we have withnessed a dramatic increment volume in the number of mobile users grows in telecommunication industry. However, this leads to drastic increase to the number of spam SMS messages. Short Message Service (SMS) is considered one of the widely used communication in telecommunication service. In reality, most of the users ignore the spam because of the lower rate of SMS and limited amount of spam classification tools. In this paper, we propose a Support Vector Machine (SVM) algorithm for SMS Spam Classification. Support Vector Machine is considered as the one of the most effective for data mining techniques. The propose algorithm have been evaluated using public dataset from UCI machine learning repository. The performance achieved is compared with other three data mining techniques such as Naïve Bayes, Multinominal Naïve Bayes and K-Nearest Neighbor with the different number of K= 1,3 and 5. Based on the measuring factors like higher accuracy, less processing time, highest kappa statistics, low error and the lowest false positive instance, it’s been identified that Support Vector Machines (SVM) outperforms better than other classifiers and it is the most accurate classifier to detect and label the spam messages with an average an accuracy is 98.9%. Comparing both the error parameter overall, the highest error has been found on the algorithm KNN with K=3 and K=5. Whereas the model with less error is SVM followed by Multinominal Naïve Bayes. Therefore, this propose method can be used as a best baseline for further comparison based on SMS spam classification

    SMS Spam Filtering: Methods and Data

    Get PDF
    Mobile or SMS spam is a real and growing problem primarily due to the availability of very cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates as it is a trusted and personal service. SMS spam filtering is a relatively new task which inherits many issues and solu- tions from email spam filtering. However it poses its own specific challenges. This paper motivates work on filtering SMS spam and reviews recent devel- opments in SMS spam filtering. The paper also discusses the issues with data collection and availability for furthering research in this area, analyses a large corpus of SMS spam, and provides some initial benchmark results

    A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset

    Get PDF
    SMS, one of the most popular and fast-growing GSM value-added services worldwide, has attracted unwanted SMS, also known as SMS spam. The effects of SMS spam are significant as it affects both the users and the service providers, causing a massive gap in trust among both parties. This article presents a deep learning model based on BiLSTM. Further, it compares our results with some of the states of the art machine learning (ML) algorithm on two datasets: our newly collected dataset and the popular UCI SMS dataset. This study aims to evaluate the performance of diverse learning models and compare the result of the new dataset expanded (ExAIS_SMS) using the following metrics the true positive (TP), false positive (FP), F-measure, recall, precision, and overall accuracy. The average accuracy for the BiLSTSM model achieved moderately improved results compared to some of the ML classifiers. The experimental results achieved significant improvement from the ground truth results after effective fine-tuning of some of the parameters. The BiLSTM model using the ExAIS_SMS dataset attained an accuracy of 93.4% and 98.6% for UCI datasets. Further comparison of the two datasets on the state-of-the-art ML classifiers gave an accuracy of Naive Bayes, BayesNet, SOM, decision tree, C4.5, J48 is 89.64%, 91.11%, 88.24%, 75.76%, 80.24%, and 79.2% respectively for ExAIS_SMS datasets. In conclusion, our proposed BiLSTM model showed significant improvement over traditional ML classifiers. To further validate the robustness of our model, we applied the UCI datasets, and our results showed optimal performance while classifying SMS spam messages based on some metrics: accuracy, precision, recall, and F-measure.publishedVersio

    Spam Detection Using Machine Learning and Deep Learning

    Get PDF
    Text messages are essential these days; however, spam texts have contributed negatively to the success of this communication mode. The compromised authenticity of such messages has given rise to several security breaches. Using spam messages, malicious links have been sent to either harm the system or obtain information detrimental to the user. Spam SMS messages as well as emails have been used as media for attacks such as masquerading and smishing ( a phishing attack through text messaging), and this has threatened both the user and service providers. Therefore, given the waves of attacks, the need to identify and remove these spam messages is important. This dissertation explores the process of text classification from data input to embedded representation of the words in vector form and finally the classification process. Therefore, we have applied different embedding methods to capture both the linguistic and semantic meanings of words. Static embedding methods that are used include Word to Vector (Word2Vec) and Global Vectors (GloVe), while for dynamic embedding the transfer learning of the Bidirectional Encoder Representations from Transformers (BERT) was employed. For classification, both machine learning and deep learning techniques were used to build an efficient and sensitive classification model with good accuracy and low false positive rate. Our result established that the combination of BERT for embedding and machine learning for classification produced better classification results than other combinations. With these results, we developed models that combined the self-feature extraction advantage of deep learning and the effective classification of machine learning. These models were tested on four different datasets, namely: SMS Spam dataset, Ling dataset, Spam Assassin dataset and Enron dataset. BERT+SVC (hybrid model) produced the result with highest accuracy and lowest false positive rate

    Deep learning to filter SMS spam

    Get PDF
    The popularity of short message service (SMS) has been growing over the last decade. For businesses, these text messages are more effective than even emails. This is because while 98% of mobile users read their SMS by the end of the day, about 80% of the emails remain unopened. The popularity of SMS has also given rise to SMS Spam, which refers to any irrelevant text messages delivered using mobile networks. They are severely annoying to users. Most existing research that has attempted to filter SMS Spam has relied on manually identified features. Extending the current literature, this paper uses deep learning to classify Spam and Not-Spam text messages. Specifically, Convolutional Neural Network and Long Short-term memory models were employed. The proposed models were based on text data only, and self-extracted the feature set. On a benchmark dataset consisting of 747 Spam and 4,827 Not-Spam text messages, a remarkable accuracy of 99.44% was achieved
    • …
    corecore