A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset

Abstract

SMS, one of the most popular and fast-growing GSM value-added services worldwide, has attracted unwanted SMS, also known as SMS spam. The effects of SMS spam are significant as it affects both the users and the service providers, causing a massive gap in trust among both parties. This article presents a deep learning model based on BiLSTM. Further, it compares our results with some of the states of the art machine learning (ML) algorithm on two datasets: our newly collected dataset and the popular UCI SMS dataset. This study aims to evaluate the performance of diverse learning models and compare the result of the new dataset expanded (ExAIS_SMS) using the following metrics the true positive (TP), false positive (FP), F-measure, recall, precision, and overall accuracy. The average accuracy for the BiLSTSM model achieved moderately improved results compared to some of the ML classifiers. The experimental results achieved significant improvement from the ground truth results after effective fine-tuning of some of the parameters. The BiLSTM model using the ExAIS_SMS dataset attained an accuracy of 93.4% and 98.6% for UCI datasets. Further comparison of the two datasets on the state-of-the-art ML classifiers gave an accuracy of Naive Bayes, BayesNet, SOM, decision tree, C4.5, J48 is 89.64%, 91.11%, 88.24%, 75.76%, 80.24%, and 79.2% respectively for ExAIS_SMS datasets. In conclusion, our proposed BiLSTM model showed significant improvement over traditional ML classifiers. To further validate the robustness of our model, we applied the UCI datasets, and our results showed optimal performance while classifying SMS spam messages based on some metrics: accuracy, precision, recall, and F-measure.publishedVersio

    Similar works