41 research outputs found
Camouflages and Token Manipulations-The Changing Faces of the Nigerian Fraudulent 419 Spammers
The inefficiencies of current spam filters against fraudulent (419) mails is not unrelated to the use by spammers of good-word
attacks, topic drifts, parasitic spamming, wrong categorization and recategorization of electronic mails by e-mail clients and of
course the fuzzy factors of greed and gullibility on the part of the recipients who responds to fraudulent spam mail offers. In this
paper, we establish that mail token manipulations remain, above any other tactics, the most potent tool used by Nigerian
scammers to fool statistical spam filters. While hoping that the uncovering of these manipulative evidences will prove useful in
future antispam research, our findings also sensitize spam filter developers on the need to inculcate within their antispam
architecture robust modules that can deal with the identified camouflages
Analyzing the Social Structure and Dynamics of E-mail and Spam in Massive Backbone Internet Traffic
E-mail is probably the most popular application on the Internet, with
everyday business and personal communications dependent on it. Spam or
unsolicited e-mail has been estimated to cost businesses significant amounts of
money. However, our understanding of the network-level behavior of legitimate
e-mail traffic and how it differs from spam traffic is limited. In this study,
we have passively captured SMTP packets from a 10 Gbit/s Internet backbone link
to construct a social network of e-mail users based on their exchanged e-mails.
The focus of this paper is on the graph metrics indicating various structural
properties of e-mail networks and how they evolve over time. This study also
looks into the differences in the structural and temporal characteristics of
spam and non-spam networks. Our analysis on the collected data allows us to
show several differences between the behavior of spam and legitimate e-mail
traffic, which can help us to understand the behavior of spammers and give us
the knowledge to statistically model spam traffic on the network-level in order
to complement current spam detection techniques.Comment: 15 pages, 20 figures, technical repor
Learning from textual data streams for detecting email spam
This master thesis introduces a method for the detecting email spam through the translation problem in incremental learning of the time series. Common spam detection systems mainly use methods of supervised learning (naive Bayesian classifier, decision trees), while in the master’s thesis presents the classification by using the methods of data stream mining.
For learning sets, we also choose the attributes that do not contain personal data and which are not required to obtain the consent of the sender or the recipient (attributes consist the envelope part of e-mail). With the help of algorithms for learning from data streams (VFDT, cVFDT) we used the electronic sequence of messages as text data stream. The results were compared with the traditional spam detection methods and they show that traditional spam detection methods have higher accuracy compared to algorithms for learning from data stream and therefore are not suitable for detecting email spam
Learning from textual data streams for detecting email spam
This master thesis introduces a method for the detecting email spam through the translation problem in incremental learning of the time series. Common spam detection systems mainly use methods of supervised learning (naive Bayesian classifier, decision trees), while in the master’s thesis presents the classification by using the methods of data stream mining.
For learning sets, we also choose the attributes that do not contain personal data and which are not required to obtain the consent of the sender or the recipient (attributes consist the envelope part of e-mail). With the help of algorithms for learning from data streams (VFDT, cVFDT) we used the electronic sequence of messages as text data stream. The results were compared with the traditional spam detection methods and they show that traditional spam detection methods have higher accuracy compared to algorithms for learning from data stream and therefore are not suitable for detecting email spam
Roskapostin torjunta- ja luokittelumenetelmät
Tässä tutkielmassa tutustutaan kirjallisuuden avulla yleisesti käytössä oleviin roskapostin torjuntamenetelmiin. Myös niitä soveltava järjestelmäkokonaisuus esitellään. Työssä käsitellään esimerkiksi mustat DNS-listat, kollaboratiivisia tekniikoita ja harmaalistaus. Sisältöpohjaisiin menetelmiin, erityisesti bayesiläiseen luokitteluun ja logistiseen regressioanalyysiin tutustutaan tarkemmin. Tutkielmassa perehdytään myös roskapostitusta rajoittavaan lainsäädäntöön ja pohditaan, minkälaisilla keinoilla päädyttäisiin kokonaisuuden kannalta parhaaseen lopputulokseen. Työn kokeellisessa osuudessa verrataan logistista regressioanalyysiä ja bayesiläistä luokittelua roskapostintunnistuksessa realistisella koeasetelmalla käyttäen aitoa sähköpostikorpusta aineistona. Tärkeimmät kokeisiin perustuvat johtopäätökset ovat, että logistiseen regressioanalyysiin pohjaava tunnistus täydentäisi luokittelutuloksen puolesta erinomaisesti roskapostintorjuntajärjestelmää bayesiläisen luokittelijan rinnalla, mutta menetelmänä se on liian hidas tietokantanoudoista johtuvan I/O-vaativuuden takia. Lisäksi todetaan, että jopa käytettyä luokittelumenetelmää tärkeämpi seikka oppivaa roskapostintunnistusta hyödyntävässä järjestelmässä saattaa olla luokittelijalle syötetty aineisto, jonka laadun varmistamiseen on syytä panostaa erityisesti monen käyttäjän roskapostintorjuntajärjestelmässä, jossa luokitellaan kaikkien käyttäjien viestit samaan aineistoon perustuen
Evaluation of Email Spam Detection Techniques
Email has become a vital form of communication among individuals and organizations in today’s world. However, simultaneously it became a threat to many users in the form of spam emails which are also referred as junk/unsolicited emails. Most of the spam emails received by the users are in the form of commercial advertising, which usually carry computer viruses without any notifications. Today, 95% of the email messages across the world are believed to be spam, therefore it is essential to develop spam detection techniques. There are different techniques to detect and filter the spam emails, but off recently all the developed techniques are being implemented successfully to minimize the threats. This paper describes how the current spam email detection approaches are determining and evaluating the problems. There are different types of techniques developed based on Reputation, Origin, Words, Multimedia, Textual, Community, Rules, Hybrid, Machine learning, Fingerprint, Social networks, Protocols, Traffic analysis, OCR techniques, Low-level features, and many other techniques. All these filtering techniques are developed to detect and evaluate spam emails. Along with classification of the email messages into spam or ham, this paper also demonstrates the effectiveness and accuracy of the spam detection techniques
Cognitive Spam Recognition Using Hadoop and Multicast-Update
In today's world of exponentially growing technology, spam is a very common issue faced by users on the internet. Spam not only hinders the performance of a network, but it also wastes space and time, and causes general irritation and presents a multitude of dangers - of viruses, malware, spyware and consequent system failure, identity theft, and other cyber criminal activity. In this context, cognition provides us with a method to help improve the performance of the distributed system. It enables the system to learn what it is supposed to do for different input types as different classifications are made over time and this learning helps it increase its accuracy as time passes. Each system on its own can only do so much learning, because of the limited sample set of inputs that it gets to process. However, in a network, we can make sure that every system knows the different kinds of inputs available and learns what it is supposed to do with a better success rate. Thus, distribution and combination of this cognition across different components of the network leads to an overall improvement in the performance of the system. In this paper, we describe a method to make machines cognitively label spam using Machine Learning and the Naive Bayesian approach. We also present two possible methods of implementation - using a MapReduce Framework (hadoop), and also using messages coupled with a multicast-send based network - with their own subtypes, and the pros and cons of each. We finally present a comparative analysis of the two main methods and provide a basic idea about the usefulness of the two in various different scenarios
Spam Filter Improvement Through Measurement
This work supports the thesis that sound quantitative evaluation for
spam filters leads to substantial improvement in the classification
of email. To this end, new laboratory testing methods and datasets
are introduced, and evidence is presented that their adoption at Text
REtrieval Conference (TREC)and elsewhere has led to an improvement in state of the art
spam filtering. While many of these improvements have been discovered
by others, the best-performing method known at this time -- spam filter
fusion -- was demonstrated by the author.
This work describes four principal dimensions of spam filter evaluation
methodology and spam filter improvement. An initial study investigates
the application of twelve open-source filter configurations in a laboratory
environment, using a stream of 50,000 messages captured from a single
recipient over eight months. The study measures the impact of user
feedback and on-line learning on filter performance using methodology
and measures which were released to the research community as the
TREC Spam Filter Evaluation Toolkit.
The toolkit was used as the basis of the TREC Spam Track, which the
author co-founded with Cormack. The Spam Track, in addition to evaluating
a new application (email spam), addressed the issue of testing systems
on both private and public data. While streams of private messages
are most realistic, they are not easy to come by and cannot be shared
with the research community as archival benchmarks. Using the toolkit,
participant filters were evaluated on both, and the differences found
not to substantially confound evaluation; as a result, public corpora
were validated as research tools. Over the course of TREC and similar
evaluation efforts, a dozen or more archival benchmarks --
some private and some public -- have become available.
The toolkit and methodology have spawned improvements in the state
of the art every year since its deployment in 2005. In 2005, 2006,
and 2007, the spam track yielded new best-performing systems based
on sequential compression models, orthogonal sparse bigram features,
logistic regression and support vector machines. Using the TREC participant
filters, we develop and demonstrate methods for on-line filter fusion
that outperform all other reported on-line personal spam filters
How to accelerate your internet : a practical guide to bandwidth management and optimisation using open source software
xiii, 298 p. : ill. ; 24 cm.Libro ElectrónicoAccess to sufficient Internet bandwidth enables worldwide electronic collaboration, access to informational resources, rapid and effective communication, and grants membership to a global community. Therefore, bandwidth is probably the single most critical resource at the disposal of a modern organisation.
The goal of this book is to provide practical information on how to gain the largest possible benefit from your connection to the Internet. By applying the monitoring and optimisation techniques discussed here, the effectiveness of your network can be significantly improved
Stable Regime, Historiography and Truth Commissions A Case Study of Pashtun Tahafuz Movement of Pakistan
This article discusses the Pashtun Tahafuz Movement's (PTM) demand for establishing a Truth and Reconciliation Commission (TRC) to facilitate the right to truth of victims of the war on terror in Pakistan. It highlights the tension among the right to truth, geopolitical considerations, and historiography in pursuit of transitional justice under a stable regime. It argues that Pakistan is not likely to establish a TRC due to its geopolitical considerations vis-a-vis Afghanistan. It, however, also underscores that PTM as a pressure group could contribute greatly to realising several human rights based right claims of the war victims, if it disengages itself from the anti-Pakistan Afghan diaspora