1,081 research outputs found

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    Feature extraction and classification of spam emails

    Get PDF

    Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

    Get PDF
    Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201

    Security Evaluation of Support Vector Machines in Adversarial Environments

    Full text link
    Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector Machine Applications

    Investigation into the Application of Personality Insights and Language Tone Analysis in Spam Classification

    Get PDF
    Due to its persistence spam remains as one of the biggest problems facing users and suppliers of email communication services. Machine learning techniques have been very successful at preventing many spam mails from arriving in user mailboxes, however they still account for over 50% of all emails sent. Despite this relative success the economic cost of spam has been estimated as high as 50billionin2005andmorerecentlyat50 billion in 2005 and more recently at 20 billion so spam can still be considered a considerable problem. In essence a spam email is a commercial communication trying to entice the receiver to take some positive action. This project uses the text from emails and creates personality insight and language tone scores through the use of IBM Watsons’ Tone Analyzer API. Those scores are used to investigate whether the language used in emails can be transformed into useful features that can be used to correctly classify them as spam or genuine emails. And during the course of this investigation a range of machine learning techniques are applied. Results from this experiment found that where just the personality insight and language tone features are used in the model some promising results with one dataset were shown. However over all datasets results were inconclusive with this model. Furthermore it was found that in a model where these features were used in combination with a normalised term-frequency feature-set no real improvement in the classification performance was shown

    A Survey of Email Spam Filtering Methods

    Get PDF
    E-mail is one of the most secure medium for online communication and transferring data or messages through the web. An overgrowing increase in popularity, the number of unsolicited data has also increased rapidly. To filtering data, different approaches exist which automatically detect and remove these untenable messages. There are several numbers of email spam filtering technique such as Knowledge-based technique, Clustering techniques, Learning based technique, Heuristic processes and so on. This paper illustrates a survey of different existing email spam filtering system regarding Machine Learning Technique (MLT) such as Naive Bayes, SVM, K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and rules. However, here we present the classification, evaluation and comparison of different email spam filtering system Keywords: e-mail spam, spam filtering methods, machine learning technique, classification, SVM, AN

    Spam Detection Using Machine Learning

    Get PDF
    Emails are essential in present century communication however spam emails have contributed negatively to the success of such communication. Studies have been conducted to classify messages in an effort to distinguish between ham and spam email by building an efficient and sensitive classification model with high accuracy and low false positive rate. Regular rule-based classifiers have been overwhelmed and less effective by the geometric growth in spam messages, hence the need to develop a more reliable and robust model. Classification methods employed includes SVM (support vector machine), Bayesian, Naïve Bayes, Bayesian with Adaboost, Naïve Bayes with Adaboost. However, for this project, the Bayesian was employed using Python programming language to develop a classification model. Keywords: machine learning (ML), machine learning classifier, Naïve Bayes, SVM, Adaboost, spam classification, ham. DOI: 10.7176/CEIS/11-3-04 Publication date:May 31st 202
    corecore