2 research outputs found

    Spam Detection Using Machine Learning and Deep Learning

    Get PDF
    Text messages are essential these days; however, spam texts have contributed negatively to the success of this communication mode. The compromised authenticity of such messages has given rise to several security breaches. Using spam messages, malicious links have been sent to either harm the system or obtain information detrimental to the user. Spam SMS messages as well as emails have been used as media for attacks such as masquerading and smishing ( a phishing attack through text messaging), and this has threatened both the user and service providers. Therefore, given the waves of attacks, the need to identify and remove these spam messages is important. This dissertation explores the process of text classification from data input to embedded representation of the words in vector form and finally the classification process. Therefore, we have applied different embedding methods to capture both the linguistic and semantic meanings of words. Static embedding methods that are used include Word to Vector (Word2Vec) and Global Vectors (GloVe), while for dynamic embedding the transfer learning of the Bidirectional Encoder Representations from Transformers (BERT) was employed. For classification, both machine learning and deep learning techniques were used to build an efficient and sensitive classification model with good accuracy and low false positive rate. Our result established that the combination of BERT for embedding and machine learning for classification produced better classification results than other combinations. With these results, we developed models that combined the self-feature extraction advantage of deep learning and the effective classification of machine learning. These models were tested on four different datasets, namely: SMS Spam dataset, Ling dataset, Spam Assassin dataset and Enron dataset. BERT+SVC (hybrid model) produced the result with highest accuracy and lowest false positive rate

    An Empirical study of a simple naive Bayes classifier based on ranking functions

    No full text
    Ranking functions provide an alternative way of modelling uncertainty. Much of the research in this area focuses on its theoretical and philosophical aspects. Approaches to solving practical problems involving uncertainty have been, by and large, dominated by probabilistic models of uncertainty. In this paper we investigate if ranking functions can be used to solve practical problems in an uncertain domain. In particular, we look at the problem of identifying spam e-mails, one of the earliest success stories of probabilistic machine learning techniques. We show how the probabilistic naive Bayes classifier can easily be translated to one based on ranking functions, and present some experimental results that demonstrate its efficacy in correctly identifying spam e-mails.8 page(s
    corecore