1,995 research outputs found
Machine Learning Approaches for Modeling Spammer Behavior
Spam is commonly known as unsolicited or unwanted email messages in the
Internet causing potential threat to Internet Security. Users spend a valuable
amount of time deleting spam emails. More importantly, ever increasing spam
emails occupy server storage space and consume network bandwidth. Keyword-based
spam email filtering strategies will eventually be less successful to model
spammer behavior as the spammer constantly changes their tricks to circumvent
these filters. The evasive tactics that the spammer uses are patterns and these
patterns can be modeled to combat spam. This paper investigates the
possibilities of modeling spammer behavioral patterns by well-known
classification algorithms such as Na\"ive Bayesian classifier (Na\"ive Bayes),
Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary
experimental results demonstrate a promising detection rate of around 92%,
which is considerably an enhancement of performance compared to similar spammer
behavior modeling research.Comment: 12 pages, 3 figures, 5 tables, Submitted to AIRS 201
Bot Spammer Detection in Twitter Using Tweet Similarity and TIME Interval Entropy
The popularity of Twitter has attracted spammers to disseminate large amount of spam messages. Preliminary studies had shown that most spam messages were produced automatically by bot. Therefore bot spammer detection can reduce the number of spam messages in Twitter significantly. However, to the best of our knowledge, few researches have focused in detecting Twitter bot spammer. Thus, this paper proposes a novel approach to differentiate between bot spammer and legitimate user accounts using time interval entropy and tweet similarity. Timestamp collections are utilized to calculate the time interval entropy of each user. Uni-gram matching-based similarity will be used to calculate tweet similarity. Datasets are crawled from Twitter containing both normal and spammer accounts. Experimental results showed that legitimate user may exhibit regular behavior in posting tweet as bot spammer. Several legitimate users are also detected to post similar tweets. Therefore it is less optimal to detect bot spammer using one of those features only. However, combination of both features gives better classification result. Precision, recall, and f-measure of the proposed method reached 85,71%, 94,74% and 90% respectively. It outperforms precision, recall, and f-measure of method which only uses either time interval entropy or tweet similarity
Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter
Social spam produces a great amount of noise on social media services such as
Twitter, which reduces the signal-to-noise ratio that both end users and data
mining applications observe. Existing techniques on social spam detection have
focused primarily on the identification of spam accounts by using extensive
historical and network-based data. In this paper we focus on the detection of
spam tweets, which optimises the amount of data that needs to be gathered by
relying only on tweet-inherent features. This enables the application of the
spam detection system to a large set of tweets in a timely fashion, potentially
applicable in a real-time or near real-time setting. Using two large
hand-labelled datasets of tweets containing spam, we study the suitability of
five classification algorithms and four different feature sets to the social
spam detection task. Our results show that, by using the limited set of
features readily available in a tweet, we can achieve encouraging results which
are competitive when compared against existing spammer detection systems that
make use of additional, costly user features. Our study is the first that
attempts at generalising conclusions on the optimal classifiers and sets of
features for social spam detection over different datasets
Minimizing the Time of Spam Mail Detection by Relocating Filtering System to the Sender Mail Server
Unsolicited Bulk Emails (also known as Spam) are undesirable emails sent to
massive number of users. Spam emails consume the network resources and cause
lots of security uncertainties. As we studied, the location where the spam
filter operates in is an important parameter to preserve network resources.
Although there are many different methods to block spam emails, most of program
developers only intend to block spam emails from being delivered to their
clients. In this paper, we will introduce a new and efficient approach to
prevent spam emails from being transferred. The result shows that if we focus
on developing a filtering method for spams emails in the sender mail server
rather than the receiver mail server, we can detect the spam emails in the
shortest time consequently to avoid wasting network resources.Comment: 10 pages, 7 figure
Spam - solutions and their problems
We analyze the success of filtering as a solution to the spam problem when used alone or concurrently with sender and/or receiver pricing. We find that filters alone may exacerbate the spam problem if the spammer attempts to evade them by sending multiple variants of the message to each consumer. Sender and receiver prices can effectively reduce or eliminating spam, either on their own or when used together with filtering. Finally, we discuss the impli- cations for social welfare of using the different spam controls.Spam; filtering; email; receiver pricing; sender pricing
- ā¦