41 research outputs found

    Camouflages and Token Manipulations-The Changing Faces of the Nigerian Fraudulent 419 Spammers

    Full text link
    The inefficiencies of current spam filters against fraudulent (419) mails is not unrelated to the use by spammers of good-word attacks, topic drifts, parasitic spamming, wrong categorization and recategorization of electronic mails by e-mail clients and of course the fuzzy factors of greed and gullibility on the part of the recipients who responds to fraudulent spam mail offers. In this paper, we establish that mail token manipulations remain, above any other tactics, the most potent tool used by Nigerian scammers to fool statistical spam filters. While hoping that the uncovering of these manipulative evidences will prove useful in future antispam research, our findings also sensitize spam filter developers on the need to inculcate within their antispam architecture robust modules that can deal with the identified camouflages

    Analyzing the Social Structure and Dynamics of E-mail and Spam in Massive Backbone Internet Traffic

    Full text link
    E-mail is probably the most popular application on the Internet, with everyday business and personal communications dependent on it. Spam or unsolicited e-mail has been estimated to cost businesses significant amounts of money. However, our understanding of the network-level behavior of legitimate e-mail traffic and how it differs from spam traffic is limited. In this study, we have passively captured SMTP packets from a 10 Gbit/s Internet backbone link to construct a social network of e-mail users based on their exchanged e-mails. The focus of this paper is on the graph metrics indicating various structural properties of e-mail networks and how they evolve over time. This study also looks into the differences in the structural and temporal characteristics of spam and non-spam networks. Our analysis on the collected data allows us to show several differences between the behavior of spam and legitimate e-mail traffic, which can help us to understand the behavior of spammers and give us the knowledge to statistically model spam traffic on the network-level in order to complement current spam detection techniques.Comment: 15 pages, 20 figures, technical repor

    Learning from textual data streams for detecting email spam

    Get PDF
    This master thesis introduces a method for the detecting email spam through the translation problem in incremental learning of the time series. Common spam detection systems mainly use methods of supervised learning (naive Bayesian classifier, decision trees), while in the master’s thesis presents the classification by using the methods of data stream mining. For learning sets, we also choose the attributes that do not contain personal data and which are not required to obtain the consent of the sender or the recipient (attributes consist the envelope part of e-mail). With the help of algorithms for learning from data streams (VFDT, cVFDT) we used the electronic sequence of messages as text data stream. The results were compared with the traditional spam detection methods and they show that traditional spam detection methods have higher accuracy compared to algorithms for learning from data stream and therefore are not suitable for detecting email spam

    Learning from textual data streams for detecting email spam

    Get PDF
    This master thesis introduces a method for the detecting email spam through the translation problem in incremental learning of the time series. Common spam detection systems mainly use methods of supervised learning (naive Bayesian classifier, decision trees), while in the master’s thesis presents the classification by using the methods of data stream mining. For learning sets, we also choose the attributes that do not contain personal data and which are not required to obtain the consent of the sender or the recipient (attributes consist the envelope part of e-mail). With the help of algorithms for learning from data streams (VFDT, cVFDT) we used the electronic sequence of messages as text data stream. The results were compared with the traditional spam detection methods and they show that traditional spam detection methods have higher accuracy compared to algorithms for learning from data stream and therefore are not suitable for detecting email spam

    Roskapostin torjunta- ja luokittelumenetelmät

    Get PDF
    Tässä tutkielmassa tutustutaan kirjallisuuden avulla yleisesti käytössä oleviin roskapostin torjuntamenetelmiin. Myös niitä soveltava järjestelmäkokonaisuus esitellään. Työssä käsitellään esimerkiksi mustat DNS-listat, kollaboratiivisia tekniikoita ja harmaalistaus. Sisältöpohjaisiin menetelmiin, erityisesti bayesiläiseen luokitteluun ja logistiseen regressioanalyysiin tutustutaan tarkemmin. Tutkielmassa perehdytään myös roskapostitusta rajoittavaan lainsäädäntöön ja pohditaan, minkälaisilla keinoilla päädyttäisiin kokonaisuuden kannalta parhaaseen lopputulokseen. Työn kokeellisessa osuudessa verrataan logistista regressioanalyysiä ja bayesiläistä luokittelua roskapostintunnistuksessa realistisella koeasetelmalla käyttäen aitoa sähköpostikorpusta aineistona. Tärkeimmät kokeisiin perustuvat johtopäätökset ovat, että logistiseen regressioanalyysiin pohjaava tunnistus täydentäisi luokittelutuloksen puolesta erinomaisesti roskapostintorjuntajärjestelmää bayesiläisen luokittelijan rinnalla, mutta menetelmänä se on liian hidas tietokantanoudoista johtuvan I/O-vaativuuden takia. Lisäksi todetaan, että jopa käytettyä luokittelumenetelmää tärkeämpi seikka oppivaa roskapostintunnistusta hyödyntävässä järjestelmässä saattaa olla luokittelijalle syötetty aineisto, jonka laadun varmistamiseen on syytä panostaa erityisesti monen käyttäjän roskapostintorjuntajärjestelmässä, jossa luokitellaan kaikkien käyttäjien viestit samaan aineistoon perustuen

    Evaluation of Email Spam Detection Techniques

    Get PDF
    Email has become a vital form of communication among individuals and organizations in today’s world. However, simultaneously it became a threat to many users in the form of spam emails which are also referred as junk/unsolicited emails. Most of the spam emails received by the users are in the form of commercial advertising, which usually carry computer viruses without any notifications. Today, 95% of the email messages across the world are believed to be spam, therefore it is essential to develop spam detection techniques. There are different techniques to detect and filter the spam emails, but off recently all the developed techniques are being implemented successfully to minimize the threats. This paper describes how the current spam email detection approaches are determining and evaluating the problems. There are different types of techniques developed based on Reputation, Origin, Words, Multimedia, Textual, Community, Rules, Hybrid, Machine learning, Fingerprint, Social networks, Protocols, Traffic analysis, OCR techniques, Low-level features, and many other techniques. All these filtering techniques are developed to detect and evaluate spam emails. Along with classification of the email messages into spam or ham, this paper also demonstrates the effectiveness and accuracy of the spam detection techniques

    Cognitive Spam Recognition Using Hadoop and Multicast-Update

    Get PDF
    In today's world of exponentially growing technology, spam is a very common issue faced by users on the internet. Spam not only hinders the performance of a network, but it also wastes space and time, and causes general irritation and presents a multitude of dangers - of viruses, malware, spyware and consequent system failure, identity theft, and other cyber criminal activity. In this context, cognition provides us with a method to help improve the performance of the distributed system. It enables the system to learn what it is supposed to do for different input types as different classifications are made over time and this learning helps it increase its accuracy as time passes. Each system on its own can only do so much learning, because of the limited sample set of inputs that it gets to process. However, in a network, we can make sure that every system knows the different kinds of inputs available and learns what it is supposed to do with a better success rate. Thus, distribution and combination of this cognition across different components of the network leads to an overall improvement in the performance of the system. In this paper, we describe a method to make machines cognitively label spam using Machine Learning and the Naive Bayesian approach. We also present two possible methods of implementation - using a MapReduce Framework (hadoop), and also using messages coupled with a multicast-send based network - with their own subtypes, and the pros and cons of each. We finally present a comparative analysis of the two main methods and provide a basic idea about the usefulness of the two in various different scenarios

    Spam Filter Improvement Through Measurement

    Get PDF
    This work supports the thesis that sound quantitative evaluation for spam filters leads to substantial improvement in the classification of email. To this end, new laboratory testing methods and datasets are introduced, and evidence is presented that their adoption at Text REtrieval Conference (TREC)and elsewhere has led to an improvement in state of the art spam filtering. While many of these improvements have been discovered by others, the best-performing method known at this time -- spam filter fusion -- was demonstrated by the author. This work describes four principal dimensions of spam filter evaluation methodology and spam filter improvement. An initial study investigates the application of twelve open-source filter configurations in a laboratory environment, using a stream of 50,000 messages captured from a single recipient over eight months. The study measures the impact of user feedback and on-line learning on filter performance using methodology and measures which were released to the research community as the TREC Spam Filter Evaluation Toolkit. The toolkit was used as the basis of the TREC Spam Track, which the author co-founded with Cormack. The Spam Track, in addition to evaluating a new application (email spam), addressed the issue of testing systems on both private and public data. While streams of private messages are most realistic, they are not easy to come by and cannot be shared with the research community as archival benchmarks. Using the toolkit, participant filters were evaluated on both, and the differences found not to substantially confound evaluation; as a result, public corpora were validated as research tools. Over the course of TREC and similar evaluation efforts, a dozen or more archival benchmarks -- some private and some public -- have become available. The toolkit and methodology have spawned improvements in the state of the art every year since its deployment in 2005. In 2005, 2006, and 2007, the spam track yielded new best-performing systems based on sequential compression models, orthogonal sparse bigram features, logistic regression and support vector machines. Using the TREC participant filters, we develop and demonstrate methods for on-line filter fusion that outperform all other reported on-line personal spam filters

    How to accelerate your internet : a practical guide to bandwidth management and optimisation using open source software

    Get PDF
    xiii, 298 p. : ill. ; 24 cm.Libro ElectrónicoAccess to sufficient Internet bandwidth enables worldwide electronic collaboration, access to informational resources, rapid and effective communication, and grants membership to a global community. Therefore, bandwidth is probably the single most critical resource at the disposal of a modern organisation. The goal of this book is to provide practical information on how to gain the largest possible benefit from your connection to the Internet. By applying the monitoring and optimisation techniques discussed here, the effectiveness of your network can be significantly improved

    Stable Regime, Historiography and Truth Commissions A Case Study of Pashtun Tahafuz Movement of Pakistan

    Get PDF
    This article discusses the Pashtun Tahafuz Movement's (PTM) demand for establishing a Truth and Reconciliation Commission (TRC) to facilitate the right to truth of victims of the war on terror in Pakistan. It highlights the tension among the right to truth, geopolitical considerations, and historiography in pursuit of transitional justice under a stable regime. It argues that Pakistan is not likely to establish a TRC due to its geopolitical considerations vis-a-vis Afghanistan. It, however, also underscores that PTM as a pressure group could contribute greatly to realising several human rights based right claims of the war victims, if it disengages itself from the anti-Pakistan Afghan diaspora
    corecore