1,366 research outputs found

    Spam image email filtering using K-NN and SVM

    Get PDF
    The developing utilization of web has advanced a simple and quick method for e-correspondence. The outstanding case for this is e-mail. Presently days sending and accepting email as a method for correspondence is prominently utilized. Be that as it may, at that point there stand up an issue in particular, Spam mails. Spam sends are the messages send by some obscure sender just to hamper the improvement of Internet e.g. Advertisement and many more.  Spammers introduced the new technique of embedding the spam mails in the attached image in the mail. In this paper, we proposed a method based on combination of SVM and KNN. SVM tend to set aside a long opportunity to prepare with an expansive information set. On the off chance that "excess" examples are recognized and erased in pre-handling, the preparation time could be diminished fundamentally. We propose a k-nearest neighbor (k-NN) based example determination strategy. The strategy tries to select the examples that are close to the choice limit and that are effectively named. The fundamental thought is to discover close neighbors to a question test and prepare a nearby SVM that jelly the separation work on the gathering of neighbors. Our experimental studies based on a public available dataset (Dredze) show that results are improved to approximately 98%

    ReP-ETD: A Repetitive Preprocessing technique for Embedded Text Detection from images in spam emails

    Get PDF
    Email service proves to be a convenient and powerful communication tool. As internet continues to grow, the type of information available to user has shifted from text only to multimedia enriched. Embedded text in multimedia content is one of the prevalent means for delivering messages to content viewers. With the increasing importance of emails and the incursions of internet marketers, spam has become a major problem and has given rise to unwanted mails. Spammers are continuously adopting new techniques to evade detection. Image spam is one such technique where in embedded text within images carries the main information of the spam message instead of text based spam. Currently, image spam is evaluated to be roughly 50% of all spam traffic and is still on the rise, thus a serious research issue. Filtering mails is one of the popular approaches used to block spam mails. This work proposes new model ReP-ETD (Repetitive Pre-processing technique for Embedded Text Detection) for efficiently and accurately detecting spam in email images. The performance of the proposed ReP-ETD model has been evaluated across the identified parameters and compared with other existing models. The simulation results demonstrate the effectiveness of the proposed model

    Image Spam Analysis

    Get PDF
    Image spam is unsolicited bulk email, where the message is embedded in an image. This technique is used to evade text-based spam lters. In this research, we analyze and compare two novel approaches for detecting spam images. Our rst approach focuses on the extraction of a broad set of image features and selection of an optimal subset using a Support Vector Machine (SVM). Our second approach is based on Principal Component Analysis (PCA), where we determine eigenvectors for a set of spam images and compute scores by projecting images onto the resulting eigenspace. Both approaches provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that should prove valuable for improving image spam detection capabilities

    A Hybrid Spam Detection Method Based on Unstructured Datasets

    Get PDF
    This document is the accepted manuscript version of the following article: Shao, Y., Trovati, M., Shi, Q. et al. Soft Comput (2017) 21: 233. The final publication is available at Springer via http://dx.doi.org/10.1007/s00500-015-1959-z. © Springer-Verlag Berlin Heidelberg 2015.The identification of non-genuine or malicious messages poses a variety of challenges due to the continuous changes in the techniques utilised by cyber-criminals. In this article, we propose a hybrid detection method based on a combination of image and text spam recognition techniques. In particular, the former is based on sparse representation-based classification, which focuses on the global and local image features, and a dictionary learning technique to achieve a spam and a ham sub-dictionary. On the other hand, the textual analysis is based on semantic properties of documents to assess the level of maliciousness. More specifically, we are able to distinguish between meta-spam and real spam. Experimental results show the accuracy and potential of our approach.Peer reviewedFinal Accepted Versio

    ReP-ETD: A Repetitive Preprocessing technique for Embedded Text Detection from images in spam emails

    Full text link

    Hybrid GA-SVM for Efficient Feature Selection in E-mail Classification

    Get PDF
    Feature selection is a problem of global combinatorial optimization in machine learning in which subsets of relevant features are selected to realize robust learning models. The inclusion of irrelevant and redundant features in the dataset can result in poor predictions and high computational overhead. Thus, selecting relevant feature subsets can help reduce the computational cost of feature measurement, speed up learning process and improve model interpretability. SVM classifier has proven inefficient in its inability to produce accurate classification results in the face of large e-mail dataset while it also consumes a lot of computational resources. In this study, a Genetic Algorithm-Support Vector Machine (GA-SVM) feature selection technique is developed to optimize the SVM classification parameters, the prediction accuracy and computation time. Spam assassin dataset was used to validate the performance of the proposed system. The hybrid GA-SVM showed remarkable improvements over SVM in terms of classification accuracy and computation time. Keywords: E-mail Classification, Feature-Selection, Genetic algorithm, Support Vector Machin
    corecore