2 research outputs found
Improving the efficiency of spam filtering through cache architecture
Blacklists (BLs), also called Domain Name Systembased Blackhole List (DNSBLs) are the databases of known internet addresses used by the spammers to send out the spam mails. Mail servers use these lists to filter out the e-mails coming from different spam sources. In contrary, Whitelists (WLs) are the explicit list of senders from whom e-mail can be accepted or delivered. Mail Transport Agent (MTA) is usually configured to reject, challenge or flag the messages which have been sent from the sources listed on one or more DNSBLs and to allow the messages from the sources listed on the WLs. In this paper, we are demonstrating how the bandwidth (the overall requests and responses that need to go over the network) performance is improved by using local caches for BLs and WLs. The actual sender\u27s IP addresses are extracted from the e-mail log. These are then compared with the list in the local caches to find out if they should be accepted or not, before they are checked against the global DNSBLs by running \u27DNSBL queries\u27 (if required). Around three quarters of the e-mail sources have been observed to be filtered locally through caches with this method. Provision of local control over the lists and lower search (filtering) time are the other related benefits. © 2008 IEEE
Recommended from our members
MapReduce based RDF assisted distributed SVM for high throughput spam filtering
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses.
Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart.
Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure.
The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin