7 research outputs found

    An innovative spam filtering model based on support vector machine

    Full text link
    Spam is commonly defined as unsolicited email messages and the goal of spam categorization is to distinguish between spam and legitimate email messages. Many researchers have been trying to separate spam from legitimate emails using machine learning algorithms based on statistical learning methods. In this paper, an innovative and intelligent spam filtering model has been proposed based on support vector machine (SVM). This model combines both linear and nonlinear SVM techniques where linear SVM performs better for text based spam classification that share similar characteristics. The proposed model considers both text and image based email messages for classification by selecting an appropriate kernel function for information transformation.<br /

    Dynamic feature selection for spam filtering using support vector machine

    Full text link

    Email categorization using (2+1)-tier classification algorithms

    Full text link
    In this paper we have proposed a spam filtering technique using (2+1)-tier classification approach. The main focus of this paper is to reduce the false positive (FP) rate which is considered as an important research issue in spam filtering. In our approach, firstly the email message will classify using first two tier classifiers and the outputs will appear to the analyzer. The analyzer will check the labeling of the output emails and send to the corresponding mailboxes based on labeling, for the case of identical prediction. If there are any misclassifications occurred by first two tier classifiers then tier-3 classifier will invoked by the analyzer and the tier-3 will take final decision. This technique reduced the analyzing complexity of our previous work. It has also been shown that the proposed technique gives better performance in terms of reducing false positive as well as better accuracy.<br /

    Email classification using data reduction method

    Full text link
    Classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. This paper has presented an effective and efficient email classification technique based on data filtering method. In our testing we have introduced an innovative filtering technique using instance selection method (ISM) to reduce the pointless data instances from training model and then classify the test data. The objective of ISM is to identify which instances (examples, patterns) in email corpora should be selected as representatives of the entire dataset, without significant loss of information. We have used WEKA interface in our integrated classification model and tested diverse classification algorithms. Our empirical studies show significant performance in terms of classification accuracy with reduction of false positive instances.<br /

    An innovative analyser for email classification based on grey list analysis

    Full text link
    In this paper we propose a new technique of email classification based on grey list (GL) analysis of user emails. This technique is based on the analysis of output emails of an integrated model which uses multiple classifiers of statistical learning algorithms. The GL is a list of classifier/(s) output which is/are not considered as true positive (TP) and true negative (TN) but in the middle of them. Many works have been done to filter spam from legitimate emails using classification algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In the case of spam detection the FP problem is unacceptable, sometimes. The proposed technique will provide a list of output emails, called &quot;grey list (GL)&quot;, to the analyser for making decisions about the status of these emails. It has been shown that the performance of our proposed technique for email classification is much better compare to existing systems, in order to reducing FP problems and accuracy. <br /

    Computing with Granular Words

    Get PDF
    Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine