25 research outputs found

    Possibility Theory-Based Approach to Spam Email Detection

    Get PDF

    Email spam detection : a symbiotic feature selection approach fostered by evolutionary computation

    Get PDF
    Post-print version (prior to journal publication)The electronic mail (email) is nowadays an essential communication service being widely used by most Internet users. One of the main problems affecting this service is the proliferation of unsolicited messages (usually denoted by spam) which, despite the efforts made by the research community, still remains as an inherent problem affecting this Internet service. In this perspective, this work proposes and explores the concept of a novel symbiotic feature selection approach allowing the exchange of relevant features among distinct collaborating users, in order to improve the behavior of anti-spam filters. For such purpose, several Evolutionary Algorithms (EA) are explored as optimization engines able to enhance feature selection strategies within the anti-spam area. The proposed mechanisms are tested using a realistic incremental retraining evaluation procedure and resorting to a novel corpus based on the well-known Enron datasets mixed with recent spam data. The obtained results show that the proposed symbiotic approach is competitive also having the advantage of preserving end-users privacy.The work of P. Cortez and P. Sousa was funded by FEDER, through the program COMPETE and the Portuguese Foundation for Science and Technology (FCT), within the project FCOMP-01-0124-FEDER-022674

    Active Multi-Field Learning for Spam Filtering

    Get PDF
    Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm

    An Empirical Study of Online Consumer Review Spam: A Design Science Approach

    Get PDF
    Because of the sheer volume of consumer reviews posted to the Internet, a manual approach for the detection and analysis of fake reviews is not practical. However, automated detection of fake reviews is a very challenging research problem given the fact that fake reviews could just look like legitimate reviews. Guided by the design science research methodology, one of the main contributions of our research work is the development of a novel methodology and an instantiation which can effectively detect untruthful consumer reviews. The results of our experiment confirm that the proposed methodology outperforms other well-known baseline methods for detecting untruthful reviews collected from amazon.com. Above all, the designed artifacts enable us to conduct an econometric analysis to examine the impact of fake reviews on product sales. To the best of our knowledge, this is the first empirical study conducted to analyze the economic impact of fake consumer reviews
    corecore