Search CORE

770 research outputs found

Symbiotic data mining for personalized spam filtering

Author: Cortez Paulo
Lopes Clotilde
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2009
Field of study

Unsolicited e-mail (spam) is a severe problem due to intrusion of privacy, online fraud, viruses and time spent reading unwanted messages. To solve this issue, Collaborative Filtering (CF) and Content-Based Filtering (CBF) solutions have been adopted. We propose a new CBF-CF hybrid approach called Symbiotic Data Mining (SDM), which aims at aggregating distinct local filters in order to improve filtering at a personalized level using collaboration while preserving privacy. We apply SDM to spam e-mail detection and compare it with a local CBF filter (i.e. Naive Bayes). Several experiments were conducted by using a novel corpus based on the well known Enron datasets mixed with recent spam. The results show that the symbiotic strategy is competitive in performance when compared to CBF and also more robust to contamination attacks.Fundação para a Ciência e a Tecnologia (FCT) - PTDC/EIA/64541/2006

Universidade do Minho: RepositoriUM

Crossref

A collaborative approach for spam detection

Author: Cortez Paulo
Machado Artur
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Electronic mail is nowadays one of the most important Internet networking services. However, there are still many challenges that should be faced in order to provide a better e-mail service quality, such as the growing dissemination of unsolicited e-mail (spam) over the Internet. This work aims to foster new research efforts giving ground to the development of novel collaborative approaches to deal with spam proliferation. Using the proposed system, which is able to complement other anti-spam solutions, end-users are allowed to share and combine spam filters in a flexible way, increasing the accuracy and resilience levels of anti-spam techniques.(undefined

Universidade do Minho: RepositoriUM

UCL Discovery

Evolutionary symbiotic feature selection for email spam detection

Author: Cortez Paulo
Rio Miguel
Rocha Miguel
Sousa Pedro
Vaz Rui Fernando Martins
Publication venue: 'Scitepress'
Publication date: 01/07/2012
Field of study

This work presents a symbiotic filtering approach enabling the exchange of relevant word features among different users in order to improve local anti-spam filters. The local spam filtering is based on a Content- Based Filtering strategy, where word frequencies are fed into a Naive Bayes learner. Several Evolutionary A l gori thms are expl ored f or f eature sel ecti on, i ncl udi ng the proposed symbi oti c exchange of the most rel evant featuresamong different users. Theexperimentswereconducted using anovel corpusbased on thewell known Enron datasets mixed with recent spam. The obtained results show that the symbiotic approach is competitive.Fundação para a Ciência e a Tecnologia (FCT) - FCOMP-01-0124-FEDER-022674COMPET

Universidade do Minho: RepositoriUM

UCL Discovery

Email spam detection : a symbiotic feature selection approach fostered by evolutionary computation

Author: De Jong K.
Guyon I.
MIGUEL RIO
MIGUEL ROCHA
PAULO CORTEZ
PEDRO SOUSA
RUI VAZ
Schryen G.
Schwartz A.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/07/2013
Field of study

Post-print version (prior to journal publication)The electronic mail (email) is nowadays an essential communication service being widely used by most Internet users. One of the main problems affecting this service is the proliferation of unsolicited messages (usually denoted by spam) which, despite the efforts made by the research community, still remains as an inherent problem affecting this Internet service. In this perspective, this work proposes and explores the concept of a novel symbiotic feature selection approach allowing the exchange of relevant features among distinct collaborating users, in order to improve the behavior of anti-spam filters. For such purpose, several Evolutionary Algorithms (EA) are explored as optimization engines able to enhance feature selection strategies within the anti-spam area. The proposed mechanisms are tested using a realistic incremental retraining evaluation procedure and resorting to a novel corpus based on the well-known Enron datasets mixed with recent spam data. The obtained results show that the proposed symbiotic approach is competitive also having the advantage of preserving end-users privacy.The work of P. Cortez and P. Sousa was funded by FEDER, through the program COMPETE and the Portuguese Foundation for Science and Technology (FCT), within the project FCOMP-01-0124-FEDER-022674

Universidade do Minho: RepositoriUM

Crossref

UCL Discovery

Towards symbiotic spam e-mail filtering

Author: Cortez Paulo
Lopes Clotilde
Sousa Pedro
Publication venue
Publication date: 01/01/2010
Field of study

This position paper discusses the use of symbiotic filtering, a novel distributed data mining approach that combines contentbased and collaborative filtering for spam detection

Universidade do Minho: RepositoriUM

Symbiotic filtering for spam email detection

Author: Cortez Paulo
Lopes Clotilde
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Elsevier BV'
Publication date: 01/08/2011
Field of study

This paper presents a novel spam filtering technique called Symbiotic Filtering (SF) that aggregates distinct local filters from several users to improve the overall perfor- mance of spam detection. SF is an hybrid approach combining some features from both Collaborative (CF) and Content-Based Filtering (CBF). It allows for the use of social networks to personalize and tailor the set of filters that serve as input to the filtering. A comparison is performed against the commonly used Naive Bayes CBF algorithm. Several experiments were held with the well-known Enron data, under both fixed and incremental symbiotic groups. We show that our system is competitive in performance and is robust against both dictionary and focused con- tamination attacks. Moreover, it can be implemented and deployed with few effort and low communication costs, while assuring privacy.Fundação para a Ciência e a Tecnologia (FCT) - bolsa PTDC/EIA/64541/200

Universidade do Minho: RepositoriUM

UCL Discovery

Spam email filtering using network-level properties

Author: Correia André
Cortez Paulo
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Spam is serious problem that affects email users (e.g. phishing attacks, viruses and time spent reading unwanted messages). We propose a novel spam email filtering approach based on network-level attributes (e.g. the IP sender geographic coordinates) that are more persistent in time when compared to message content. This approach was tested using two classifiers, Naive Bayes (NB) and Support Vector Machines (SVM), and compared against bag-of-words models and eight blacklists. Several experiments were held with recent collected legitimate (ham) and non legitimate (spam) messages, in order to simulate distinct user profiles from two countries (USA and Portugal). Overall, the network-level based SVM model achieved the best discriminatory performance. Moreover, preliminary results suggests that such method is more robust to phishing attacks.Fundação para a Ciência e a Tecnologia (FCT) - PTDC/EIA/64541/200

CiteSeerX

Universidade do Minho: RepositoriUM

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive