79 research outputs found
An approach to preventing spam using Access Codes with a combination of anti-spam mechanisms
Spam is becoming a more and more severe problem for individuals, networks,
organisations and businesses. The losses caused by spam are billions of dollars every
year. Research shows that spam contributes more than 80% of e-mails with an increased
in its growth rate every year. Spam is not limited to emails; it has started affecting other
technologies like VoIP, cellular and traditional telephony, and instant messaging services.
None of the approaches (including legislative, collaborative, social awareness and
technological) separately or in combination with other approaches, can prevent sufficient
of the spam to be deemed a solution to the spam problem.
The severity of the spam problem and the limitations of the state-of-the-Art solutions
create a strong need for an efficient anti-spam mechanism that can prevent significant
volumes of spam without showing any false positives. This can be achieved by an
efficient anti-spam mechanism such as the proposed anti-spam mechanism known as
"Spam Prevention using Access Codes", SPAC. SPAC targets spam from two angles i.e.
to prevent/block spam and to discourage spammers by making the infrastructure
environment very unpleasant for them.
In addition to the idea of Access Codes, SPAC combines the ideas behind some of the
key current technological anti-spam measures to increase effectiveness. The difference in
this work is that SPAC uses those ideas effectively and combines them in a unique way
which enables SPAC to acquire the good features of a number of technological anti-spam
approaches without showing any of the drawbacks of these approaches. Sybil attacks,
Dictionary attacks and address spoofing have no impact on the performance of SPAC. In
fact SPAC functions in a similar way (i.e. as for unknown persons) for these sorts of
attacks.
An application known as the "SPAC application" has been developed to test the
performance of the SPAC mechanism. The results obtained from various tests on the
SPAC application show that SPAC has a clear edge over the existing anti-spam
technological approaches
Personal Email Spam Filtering with Minimal User Interaction
This thesis investigates ways to reduce or eliminate the necessity of user input to
learning-based personal email spam filters. Personal spam filters have been shown in
previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible.
This work describes new approaches to solve the problem of building a personal
spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to
intra-user training using the best known methods. Moreover, contrary to previous
literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times.
We also adapt and modify a graph-based semi-supervising learning algorithm to
build a filter that can classify an entire inbox trained on twenty or fewer user judgments.
Our experiments show that this approach compares well with previous techniques when
trained on as few as two training examples.
We also present the toolkit we developed to perform privacy-preserving user studies
on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email
stream.
To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially
improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the
autonomous filter in a user study
Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques.
Spam is one of the main problems in emails communications. As the volume of non-english language spam increases, little work is done in this area. For example, in Arab world users receive spam written mostly in arabic, english or mixed Arabic and english. To filter this kind of messages, this research applied several machine learning techniques. Many researchers have used machine learning techniques to filter spam email messages. This study compared six supervised machine learning classifiers which are maximum entropy, decision trees, artificial neural nets, naïve bayes, support system machines and k-nearest neighbor. The experiments suggested that words in Arabic messages should be stemmed before applying classifier. In addition, in most cases, experiments showed that classifiers using feature selection techniques can achieve comparable or better performance than filters do not used them
Detecting spam relays by SMTP traffic characteristics using an autonomous detection system
Spam emails are flooding the Internet. Research to prevent spam is an ongoing concern. SMTP traffic was collected from different sources in real networks and analyzed to determine the difference regarding SMTP traffic characteristics of legitimate email clients, legitimate email servers and spam relays. It is found that SMTP traffic from legitimate sites and non-legitimate sites are different and could be distinguished from each other. Some methods, which are based on analyzing SMTP traffic characteristics, were purposed to identify spam relays in the network in this thesis. An autonomous combination system, in which machine learning technologies were employed, was developed to identify spam relays in this thesis. This system identifies spam relays in real time before spam emails get to an end user by using SMTP traffic characteristics never involving email real content. A series of tests were conducted to evaluate the performance of this system. And results show that the system can identify spam relays with a high spam relay detection rate and an acceptable ratio of false positive errors
Computing with Granular Words
Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine
Resolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm
A well-known approach for collaborative spam filtering is to determine which emails belong to the same bulk, e.g. by exploiting their content similarity. This allows, after observing an initial portion of a bulk, for the bulkiness scores to be assigned to the remaining emails from the same bulk. This also allows the individual evidence of spamminess to be joined, if such evidence is generated by collaborating filters or users for some of the emails from an initial portion of the bulk. Usually a database of previously observed emails or email digests is formed and queried upon receiving new emails. Previous evaluations [2,10] of the approach based on the email digests that preserve email content similarity indicate and partially demonstrate that there are ways to make the approach robust to increased obfuscation efforts by spammers. However, for the settings of the parameters that provide good matching between the emails from the same bulk, the unwanted random matching between ham emails and unrelated ham and spam emails stays rather high. This directly translates into a need for use of higher bulkiness thresholds in order to ensure low false positive (FP) detection of ham, which implies that larger initial parts of spam bulks will not be filtered, i.e. true positive (TP) detection will not be very high (FP-TP conflict). In this paper we demonstrate how, by use of the negative selection algorithm, the unwanted random matching between unrelated emails may be decreased at least by an order of magnitude, while preserving the same good matching between the emails from the same bulk. We also show how this translates into an order of magnitude (at least) of less undetected bulky spam emails, under the same ham miss- detection requirements
A Numerical Approach for Assigning a Reputation to Users of an IoT Framework
AbstractNowadays, in the Internet of Things (IoT) society, the massive use of technological devices available to the people makes possible to collect a lot of data describing tastes, choices and behaviours related to the users of services and tools. These information can be rearranged and interpreted in order to obtain a rating (i.e., evaluation) of the subjects (i.e., users) interacting with specific objects (i.e., items). Generally, reputation systems are widely used to provide ratings to products, services, companies, digital contents and people. Here, we focus on this issue, adopting a Collaborative Reputation System (CRS) to evaluate the visitors' behaviour in a real cultural event. The results obtained, compared with those obtained by other methods (i.e., classification), have confirmed the reliability and the usefulness of CRSes for deeply understand dynamics related to visiting styles
- …