Search CORE

3,299 research outputs found

"May I borrow Your Filter?" Exchanging Filters to Combat Spam in a Community

Author: Battiti Roberto
Cascella Roberto G.
Garg Anurag
Publication venue
Publication date: 01/11/2005
Field of study

Leveraging social networks in computer systems can be effective in dealing with a number of trust and security issues. Spam is one such issue where the "wisdom of crowds" can be harnessed by mining the collective knowledge of ordinary individuals. In this paper, we present a mechanism through which members of a virtual community can exchange information to combat spam. Previous attempts at collaborative spam filtering have concentrated on digest-based indexing techniques to share digests or fingerprints of emails that are known to be spam. We take a different approach and allow users to share their spam filters instead, thus dramatically reducing the amount of traffic generated in the network. The resultant diversity in the filters and cooperation in a community allows it to respond to spam in an autonomic fashion. As a test case for exchanging filters we use the popular SpamAssassin spam filtering software and show that exchanging spam filters provides an alternative method to improve spam filtering performance

Unitn-eprints Research

Spam

Author: de Freitas Sara
Levene Mark
Publication venue: Idea Group Reference (an imprint of Idea Group Inc.)
Publication date: 01/01/2005
Field of study

With the advent of the electronic mail system in the 1970s, a new opportunity for direct marketing using unsolicited electronic mail became apparent. In 1978, Gary Thuerk compiled a list of those on the Arpanet and then sent out a huge mailing publicising Digital Equipment Corporation (DEC—now Compaq) systems. The reaction from the Defense Communications Agency (DCA), who ran Arpanet, was very negative, and it was this negative reaction that ensured that it was a long time before unsolicited e-mail was used again (Templeton, 2003). As long as the U.S. government controlled a major part of the backbone, most forms of commercial activity were forbidden (Hayes, 2003). However, in 1993, the Internet Network Information Center was privatized, and with no central government controls, spam, as it is now called, came into wider use. The term spam was taken from the Monty Python Flying Circus (a UK comedy group) and their comedy skit that featured the ironic spam song sung in praise of spam (luncheon meat)—“spam, spam, spam, lovely spam”—and it came to mean mail that was unsolicited. Conversely, the term ham came to mean e-mail that was wanted. Brad Templeton, a UseNet pioneer and chair of the Electronic Frontier Foundation, has traced the first usage of the term spam back to MUDs (Multi User Dungeons), or real-time multi-person shared environment, and the MUD community. These groups introduced the term spam to the early chat rooms (Internet Relay Chats). The first major UseNet (the world’s largest online conferencing system) spam sent in January 1994 and was a religious posting: “Global alert for all: Jesus is coming soon.” The term spam was more broadly popularised in April 1994, when two lawyers, Canter and Siegel from Arizona, posted a message that advertized their information and legal services for immigrants applying for the U.S. Green Card scheme. The message was posted to every newsgroup on UseNet, and after this incident, the term spam became synonymous with junk or unsolicited e-mail. Spam spread quickly among the UseNet groups who were easy targets for spammers simply because the e-mail addresses of members were widely available (Templeton, 2003)

Research Repository

Birkbeck Institutional Research Online

Coventry University Pure Portal

A collaborative approach for spam detection

Author: Cortez Paulo
Machado Artur
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Electronic mail is nowadays one of the most important Internet networking services. However, there are still many challenges that should be faced in order to provide a better e-mail service quality, such as the growing dissemination of unsolicited e-mail (spam) over the Internet. This work aims to foster new research efforts giving ground to the development of novel collaborative approaches to deal with spam proliferation. Using the proposed system, which is able to complement other anti-spam solutions, end-users are allowed to share and combine spam filters in a flexible way, increasing the accuracy and resilience levels of anti-spam techniques.(undefined

Universidade do Minho: RepositoriUM

UCL Discovery

Symbiotic filtering for spam email detection

Author: Cortez Paulo
Lopes Clotilde
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Elsevier BV'
Publication date: 01/08/2011
Field of study

This paper presents a novel spam filtering technique called Symbiotic Filtering (SF) that aggregates distinct local filters from several users to improve the overall perfor- mance of spam detection. SF is an hybrid approach combining some features from both Collaborative (CF) and Content-Based Filtering (CBF). It allows for the use of social networks to personalize and tailor the set of filters that serve as input to the filtering. A comparison is performed against the commonly used Naive Bayes CBF algorithm. Several experiments were held with the well-known Enron data, under both fixed and incremental symbiotic groups. We show that our system is competitive in performance and is robust against both dictionary and focused con- tamination attacks. Moreover, it can be implemented and deployed with few effort and low communication costs, while assuring privacy.Fundação para a Ciência e a Tecnologia (FCT) - bolsa PTDC/EIA/64541/200

Universidade do Minho: RepositoriUM

UCL Discovery

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Symbiotic data mining for personalized spam filtering

Author: Cortez Paulo
Lopes Clotilde
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2009
Field of study

Unsolicited e-mail (spam) is a severe problem due to intrusion of privacy, online fraud, viruses and time spent reading unwanted messages. To solve this issue, Collaborative Filtering (CF) and Content-Based Filtering (CBF) solutions have been adopted. We propose a new CBF-CF hybrid approach called Symbiotic Data Mining (SDM), which aims at aggregating distinct local filters in order to improve filtering at a personalized level using collaboration while preserving privacy. We apply SDM to spam e-mail detection and compare it with a local CBF filter (i.e. Naive Bayes). Several experiments were conducted by using a novel corpus based on the well known Enron datasets mixed with recent spam. The results show that the symbiotic strategy is competitive in performance when compared to CBF and also more robust to contamination attacks.Fundação para a Ciência e a Tecnologia (FCT) - PTDC/EIA/64541/2006

Universidade do Minho: RepositoriUM

Crossref

Data Leak Detection As a Service: Challenges and Solutions

Author: Shu Xiaokui
Yao Danfeng (Daphne)
Publication venue
Publication date: 01/01/2012
Field of study

We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework

Computer Science Technical Reports @Virginia Tech

Context-aware collaborative data stream mining in ubiquitous devices

Author: Gaber M.
Gomes J.
Menasalvas E.
Sousa P.
Publication venue
Publication date: 29/10/2011
Field of study

Portsmouth University Research Portal (Pure)