1 research outputs found
Detecting Spammers via Aggregated Historical Data Set
The battle between email service providers and senders of mass unsolicited
emails (Spam) continues to gain traction. Vast numbers of Spam emails are sent
mainly from automatic botnets distributed over the world. One method for
mitigating Spam in a computationally efficient manner is fast and accurate
blacklisting of the senders. In this work we propose a new sender reputation
mechanism that is based on an aggregated historical data-set which encodes the
behavior of mail transfer agents over time. A historical data-set is created
from labeled logs of received emails. We use machine learning algorithms to
build a model that predicts the \emph{spammingness} of mail transfer agents in
the near future. The proposed mechanism is targeted mainly at large enterprises
and email service providers and can be used for updating both the black and the
white lists. We evaluate the proposed mechanism using 9.5M anonymized log
entries obtained from the biggest Internet service provider in Europe.
Experiments show that proposed method detects more than 94% of the Spam emails
that escaped the blacklist (i.e., TPR), while having less than 0.5%
false-alarms. Therefore, the effectiveness of the proposed method is much
higher than of previously reported reputation mechanisms, which rely on emails
logs. In addition, the proposed method, when used for updating both the black
and white lists, eliminated the need in automatic content inspection of 4 out
of 5 incoming emails, which resulted in dramatic reduction in the filtering
computational load.Comment: This is a conference version of the HDS research. 13 pages 10 figure