Search CORE

3 research outputs found

Combating Good Word Attacks on Statistical Spam Filters with Multiple Instance Learning

Author: Meador Inge
Yan Zhou
Zach Jorgensen
Publication venue
Publication date
Field of study

Statistical spam filters are known to be vulnerable to adversarial attacks. One such adversarial attack, known as the Good Word Attack, thwarts spam filters by appending to spam messages sets of “good ” words, which are common in legitimate e-mail but rare in spam. We present a counterattack strategy that first attempts to differentiate spam from legitimate e-mail in the input space, by transforming each email into a bag of multiple segments, and subsequently applies multiple instance logistic regression on the bags. We treat each segment in the bag as an instance. An e-mail is classified as spam if at least one instance in the corresponding bag is spam, and as legitimate if all the instances in it are legitimate. We show that a spam filter using our multiple instance counter-attack strategy stands up better to good word attacks than its single instance counterpart and the commonly practiced Bayesian filters. 1

CiteSeerX

Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

Author: Amari
Baeza-Yates
Bates
Bayes
Bishop
Borja Sanz
Bratko
Breiman
Carlos Laorden
Carnap
Carpinter
Chapelle
Cohen
Cormack
Cranor
Cruse
Dietterich
Drucker
Fellbaum
Gross
Heron
Ide
Igor Santos
Jagatic
Jung
Karlberger
Kent
Kotsiantis
Kuropka
Lewis
Lovins
Maron
Ming-Tzu
Navigli
Pablo G. Bringas
Platt
Quinlan
Quinlan
Russell
Sakkis
Salton
Salton
Sebastiani
Seewald
Singh
Vapnik
Wilbur
Wolpert
Zhang
Zhou
Üstün
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref