Search CORE

79 research outputs found

An approach to preventing spam using Access Codes with a combination of anti-spam mechanisms

Author: Akhtar H. Khalil (7201979)
Publication venue
Publication date: 01/01/2009
Field of study

Spam is becoming a more and more severe problem for individuals, networks, organisations and businesses. The losses caused by spam are billions of dollars every year. Research shows that spam contributes more than 80% of e-mails with an increased in its growth rate every year. Spam is not limited to emails; it has started affecting other technologies like VoIP, cellular and traditional telephony, and instant messaging services. None of the approaches (including legislative, collaborative, social awareness and technological) separately or in combination with other approaches, can prevent sufficient of the spam to be deemed a solution to the spam problem. The severity of the spam problem and the limitations of the state-of-the-Art solutions create a strong need for an efficient anti-spam mechanism that can prevent significant volumes of spam without showing any false positives. This can be achieved by an efficient anti-spam mechanism such as the proposed anti-spam mechanism known as "Spam Prevention using Access Codes", SPAC. SPAC targets spam from two angles i.e. to prevent/block spam and to discourage spammers by making the infrastructure environment very unpleasant for them. In addition to the idea of Access Codes, SPAC combines the ideas behind some of the key current technological anti-spam measures to increase effectiveness. The difference in this work is that SPAC uses those ideas effectively and combines them in a unique way which enables SPAC to acquire the good features of a number of technological anti-spam approaches without showing any of the drawbacks of these approaches. Sybil attacks, Dictionary attacks and address spoofing have no impact on the performance of SPAC. In fact SPAC functions in a similar way (i.e. as for unknown persons) for these sorts of attacks. An application known as the "SPAC application" has been developed to test the performance of the SPAC mechanism. The results obtained from various tests on the SPAC application show that SPAC has a clear edge over the existing anti-spam technological approaches

Loughborough University Institutional Repository

Personal Email Spam Filtering with Minimal User Interaction

Author: Mojdeh Mona
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

This thesis investigates ways to reduce or eliminate the necessity of user input to learning-based personal email spam filters. Personal spam filters have been shown in previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible. This work describes new approaches to solve the problem of building a personal spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to intra-user training using the best known methods. Moreover, contrary to previous literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times. We also adapt and modify a graph-based semi-supervising learning algorithm to build a filter that can classify an entire inbox trained on twenty or fewer user judgments. Our experiments show that this approach compares well with previous techniques when trained on as few as two training examples. We also present the toolkit we developed to perform privacy-preserving user studies on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email stream. To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the autonomous filter in a user study

University of Waterloo's Institutional Repository

Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques.

Author: El-Halees Alaa M.
Publication venue
Publication date: 01/01/2009
Field of study

Spam is one of the main problems in emails communications. As the volume of non-english language spam increases, little work is done in this area. For example, in Arab world users receive spam written mostly in arabic, english or mixed Arabic and english. To filter this kind of messages, this research applied several machine learning techniques. Many researchers have used machine learning techniques to filter spam email messages. This study compared six supervised machine learning classifiers which are maximum entropy, decision trees, artificial neural nets, naïve bayes, support system machines and k-nearest neighbor. The experiments suggested that words in Arabic messages should be stemmed before applying classifier. In addition, in most cases, experiments showed that classifiers using feature selection techniques can achieve comparable or better performance than filters do not used them

Institutional Repository of the Islamic University of Gaza

Detecting spam relays by SMTP traffic characteristics using an autonomous detection system

Author: Hao Wu (65943)
Publication venue
Publication date: 01/01/2011
Field of study

Spam emails are flooding the Internet. Research to prevent spam is an ongoing concern. SMTP traffic was collected from different sources in real networks and analyzed to determine the difference regarding SMTP traffic characteristics of legitimate email clients, legitimate email servers and spam relays. It is found that SMTP traffic from legitimate sites and non-legitimate sites are different and could be distinguished from each other. Some methods, which are based on analyzing SMTP traffic characteristics, were purposed to identify spam relays in the network in this thesis. An autonomous combination system, in which machine learning technologies were employed, was developed to identify spam relays in this thesis. This system identifies spam relays in real time before spam emails get to an end user by using SMTP traffic characteristics never involving email real content. A series of tests were conducted to evaluate the performance of this system. And results show that the system can identify spam relays with a high spam relay detection rate and an acceptable ratio of false positive errors

Loughborough University Institutional Repository

Computing with Granular Words

Author: Hou Hailong
Publication venue: ScholarWorks @ Georgia State University
Publication date: 07/05/2011
Field of study

Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine

ScholarWorks @ Georgia State University

Resolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm

Author: Le Boudec Jean-Yves
Perez Sabrina
Sarafijanovic Slavisa
Publication venue
Publication date: 18/09/2008
Field of study

A well-known approach for collaborative spam filtering is to determine which emails belong to the same bulk, e.g. by exploiting their content similarity. This allows, after observing an initial portion of a bulk, for the bulkiness scores to be assigned to the remaining emails from the same bulk. This also allows the individual evidence of spamminess to be joined, if such evidence is generated by collaborating filters or users for some of the emails from an initial portion of the bulk. Usually a database of previously observed emails or email digests is formed and queried upon receiving new emails. Previous evaluations [2,10] of the approach based on the email digests that preserve email content similarity indicate and partially demonstrate that there are ways to make the approach robust to increased obfuscation efforts by spammers. However, for the settings of the parameters that provide good matching between the emails from the same bulk, the unwanted random matching between ham emails and unrelated ham and spam emails stays rather high. This directly translates into a need for use of higher bulkiness thresholds in order to ensure low false positive (FP) detection of ham, which implies that larger initial parts of spam bulks will not be filtered, i.e. true positive (TP) detection will not be very high (FP-TP conflict). In this paper we demonstrate how, by use of the negative selection algorithm, the unwanted random matching between unrelated emails may be decreased at least by an order of magnitude, while preserving the same good matching between the emails from the same bulk. We also show how this translates into an order of magnitude (at least) of less undetected bulky spam emails, under the same ham miss- detection requirements

Infoscience - École polytechnique fédérale de Lausanne

A Numerical Approach for Assigning a Reputation to Users of an IoT Framework

Author: Ardelio Galletti
Giovanni Ponti
Pasquale De Michele
Salvatore Cuomo
Publication venue
Publication date: 01/01/2016
Field of study

AbstractNowadays, in the Internet of Things (IoT) society, the massive use of technological devices available to the people makes possible to collect a lot of data describing tastes, choices and behaviours related to the users of services and tools. These information can be rearranged and interpreted in order to obtain a rating (i.e., evaluation) of the subjects (i.e., users) interacting with specific objects (i.e., items). Generally, reputation systems are widely used to provide ratings to products, services, companies, digital contents and people. Here, we focus on this issue, adopting a Collaborative Reputation System (CRS) to evaluate the visitors' behaviour in a real cultural event. The results obtained, compared with those obtained by other methods (i.e., classification), have confirmed the reliability and the usefulness of CRSes for deeply understand dynamics related to visiting styles

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Archivio della ricerca - Università degli studi di Napoli Federico II

Elsevier - Publisher Connector

Crossref

Open Access Repository

Leveraging Email based Social Networks to Prevent Spam: Framework, System Design and Evaluation

Author: Hameed Sufian
Publication venue
Publication date: 06/09/2012
Field of study

Georg-August-University Göttingen