1 research outputs found
A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters
Naive Bayes spam filters are highly susceptible to data poisoning attacks.
Here, known spam sources/blacklisted IPs exploit the fact that their received
emails will be treated as (ground truth) labeled spam examples, and used for
classifier training (or re-training). The attacking source thus generates
emails that will skew the spam model, potentially resulting in great
degradation in classifier accuracy. Such attacks are successful mainly because
of the poor representation power of the naive Bayes (NB) model, with only a
single (component) density to represent spam (plus a possible attack). We
propose a defense based on the use of a mixture of NB models. We demonstrate
that the learned mixture almost completely isolates the attack in a second NB
component, with the original spam component essentially unchanged by the
attack. Our approach addresses both the scenario where the classifier is being
re-trained in light of new data and, significantly, the more challenging
scenario where the attack is embedded in the original spam training set. Even
for weak attack strengths, BIC-based model order selection chooses a
two-component solution, which invokes the mixture-based defense. Promising
results are presented on the TREC 2005 spam corpus