    Improving the efficiency of spam filtering through cache architecture

    Blacklists (BLs), also called Domain Name Systembased Blackhole List (DNSBLs) are the databases of known internet addresses used by the spammers to send out the spam mails. Mail servers use these lists to filter out the e-mails coming from different spam sources. In contrary, Whitelists (WLs) are the explicit list of senders from whom e-mail can be accepted or delivered. Mail Transport Agent (MTA) is usually configured to reject, challenge or flag the messages which have been sent from the sources listed on one or more DNSBLs and to allow the messages from the sources listed on the WLs. In this paper, we are demonstrating how the bandwidth (the overall requests and responses that need to go over the network) performance is improved by using local caches for BLs and WLs. The actual sender\u27s IP addresses are extracted from the e-mail log. These are then compared with the list in the local caches to find out if they should be accepted or not, before they are checked against the global DNSBLs by running \u27DNSBL queries\u27 (if required). Around three quarters of the e-mail sources have been observed to be filtered locally through caches with this method. Provision of local control over the lists and lower search (filtering) time are the other related benefits. © 2008 IEEE

    Possibility Theory-Based Approach to Spam Email Detection

    Spam filtering based on preference ranking

    When the average number of spam messages received is continually increasing exponentially, both the Internet service provider and the end user suffer. The lack of an efficient solution may threaten the usability of the email as a communication means. In this paper we present a filtering mechanism applying the idea of preference ranking. This filtering mechanism will distinguish spam emails from other email on the Internet. The preference ranking gives the similarity values for nominated emails and spam emails specified by users, so that the ISP/end users can deal with spam emails at filtering points. We designed three filtering points to classify nominated emails into spam email, unsure email and legitimate email. This filtering mechanism can be applied on both middleware and at the client-side. The experiments show that high precision, recall and TCR (total cost ratio) of spam emails can be predicted for the preference based filtering mechanisms.

    Nonstationary regression with support vector machines

    In this work, we introduce a method for data analysis in nonstationary environments: time-adaptive support vector regression (TA-SVR). The proposed approach extends a previous development that was limited to classification problems. Focusing our study on time series applications, we show that TA-SVR can improve the accuracy of several aspects of nonstationary data analysis, namely the tasks of modelling and prediction, input relevance estimation, and reconstruction of a hidden forcing profile.

    Genres of Spam

    Spam is currently the dominant form of communications on the internet, accounting for most e-mail traffic. Spam is a marketing device, it is also an expensive and time-consuming nuisance for industrires as well as a major vehicle for serious internet crimes. While considerable research has focused on the technical aspects of spam, how it works and how it can be blocked, our research aims to better understand why it works. We explore how genre theory can contribute to our understanding of ‘spam’. Our study consists of two parts. The first examined the content, form and specific features and considered the manifest relationship to existing genres of communication. The second part of the study focused on a detailed analysis of 111 Nigerian letters, a particularly noxious form of spam. Genre is generally considered useful because it makes communications more recognizable and understandable to recipients, helping readers process information. Our study suggests that spam is not a single genre but adaptations of many recognizable print genres. With spam, genre operates at several levels and is often used to mask rather than reveal intent. The paper concludes that spam exploits genre by conforming to known forms while at the same time breaching those norms

    A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters

    Towards eradication of SPAM: A study on intelligent adaptive SPAM filters

    As the massive increase of electronic mail (email) usage continues, SPAM (unsolicited bulk email), has continued to grow because it is a very inexpensive method of advertising. These unwanted emails can cause a serious problem by filling up the email inbox and thereby leaving no space for legitimate emails to pass through. Currently the only defense against SPAM is the use of SPAM filters. A novel SPAM filter GetEmail5 along with the design rationale, is described in this thesis. To test the efficacy of GetEmail5 SPAM filter, an experimental setup was created and a commercial bulk email program was used to send SPAM and non-SPAM emails to test the new SPAM filter. GetEmail5's efficiency and ability to detect SPAM was compared against two highly ranked commercial SPAM filters on different sets of emails, these included all SPAM, non-SPAM, and mixed emails, also text and HTML emails. The results showed the superiority of GetEmail5 compared to the two commercial SPAM filters in detecting SPAM emails and reducing the user's involvement in categorizing the incoming emails. This thesis demonstrates the design rationale for GetEmail5 and also its greater effectiveness in comparison with the commercial SPAM filters tested

    Roskapostin torjunta- ja luokittelumenetelmät

    Tässä tutkielmassa tutustutaan kirjallisuuden avulla yleisesti käytössä oleviin roskapostin torjuntamenetelmiin. Myös niitä soveltava järjestelmäkokonaisuus esitellään. Työssä käsitellään esimerkiksi mustat DNS-listat, kollaboratiivisia tekniikoita ja harmaalistaus. Sisältöpohjaisiin menetelmiin, erityisesti bayesiläiseen luokitteluun ja logistiseen regressioanalyysiin tutustutaan tarkemmin. Tutkielmassa perehdytään myös roskapostitusta rajoittavaan lainsäädäntöön ja pohditaan, minkälaisilla keinoilla päädyttäisiin kokonaisuuden kannalta parhaaseen lopputulokseen. Työn kokeellisessa osuudessa verrataan logistista regressioanalyysiä ja bayesiläistä luokittelua roskapostintunnistuksessa realistisella kokoasemalla käyttäen aitoa sähköpostikorpusta aineistona. Tärkeimmät kokeisiin perustuvat johtopäätökset ovat, että logistiseen regressioanalyysiin pohjaava tunnistus täydentäisi luokittelutuloksen puolesta erinomaisesti roskapostintorjuntajärjestelmää bayesiläisen luokittelijan rinnalla, mutta menetelmänä se on liian hidas tietokantanoudoista johtuvan I/O-vaativuuden takia. Lisäksi todetaan, että jopa käytettyä luokittelumenetelmää tärkeämpi seikka oppivaa roskapostintunnistusta hyödyntävässä järjestelmässä saattaa olla luokittelijalle syötetty aineisto, jonka laadun varmistamiseen on syytä panostaa erityisesti monen käyttäjän roskapostintorjuntajärjestelmässä, jossa luokitellaan kaikkien käyttäjien viestit samaan aineistoon perustuen

    Adaptive filtering of SPAM

    In this paper, we present a new spam filter which acts as an additional layer in the spam filtering process. This filter is based on what we call a representative vocabulary. Spam e-mails are divided into categories in which each category is represented by a set of tokens which form a Representative Text (RT). Tokens are strings of characters (words, sentences, or some times meaningless strings of characters). This RT is used to compute a resemblance ratio with incoming e-mails. With this ratio we decide whether the incoming e-mail is a spam. This filter was implemented and integrated to Spamihilator software. Some experimental and interesting results will be presented.