17 research outputs found

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    Definition of Spam 2.0: New Spamming Boom

    Get PDF
    The most widely recognized form of spam is e-mail spam, however the term “spam” is used to describe similarabuses in other media and mediums. Spam 2.0 (or Web 2.0 Spam) is refereed to as spam content that is hosted on online Web 2.0 applications. In this paper: we provide a definition of Spam 2.0, identify and explain different entities within Spam 2.0, discuss new difficulties associated with Spam 2.0, outline its significance, and list possible countermeasure. The aim of this paper is to provide the reader with a complete understanding of this new form of spamming

    Realizing the Power of Extelligence: A New Business Model for Academic Publishing

    Get PDF
    The limitations of traditional academic knowledge exchange systems such as conferences and peer-reviewed journals result in discipline-based scholarship that is feudal in nature and can only dissipate as cross-disciplinary research expands. The next evolutionary step is democratic online knowledge exchange, run by the academic many rather than the publishing-oligarchic few. Using sociotechnical tools it is possible to implement an academic publishing business model that maximizes the power of “extelligence”, or knowledge realized through the collective gifting of information. Such a model would change the roles of journal editors and peer reviewers from knowledge gatekeepers to knowledge guides, and change the competitive yet conforming behaviors of academic researchers seeking publication to behaviors that reward collaborative activity that engages research communities in the act of knowledge exchange. We argue that socio-technical systems, social systems sitting on a technical base such as the Internet, can provide effective ways to motivate people to increase knowledge that research communities can share. By employing a hybrid of wiki, e-journal, electronic repository, micro-commenting and reputation systems for readers and writers, along with other socio-technical functions common to social computing such as social book-marking and reader recommendation, we can move from our traditional print publishing model in which prestige is established through publication in slowly produced, expensive and virtually unread journals to a vibrant, online knowledge exchange community built upon the foundations of legitimacy, transparency and freedom

    Single-Class Learning for Spam Filtering: An Ensemble Approach

    Get PDF
    Spam, also known as Unsolicited Commercial Email (UCE), has been an increasingly annoying problem to individuals and organizations. Most of prior research formulated spam filtering as a classical text categorization task, in which training examples must include both spam emails (positive examples) and legitimate mails (negatives). However, in many spam filtering scenarios, obtaining legitimate emails for training purpose is more difficult than collecting spam and unclassified emails. Hence, it would be more appropriate to construct a classification model for spam filtering from positive (i.e., spam emails) and unlabeled instances only; i.e., training a spam filter without any legitimate emails as negative training examples. Several single-class learning techniques that include PNB and PEBL have been proposed in the literature. However, they incur fundamental limitations when applying to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address the limitations of PNB and PEBL. Specifically, we follow the two-stage framework of PEBL and extend each stage with an ensemble strategy. Our empirical evaluation results on two spam-filtering corpora suggest that the proposed E2 technique exhibits more stable and reliable performance than its benchmark techniques (i.e., PNB and PEBL)

    Evolution of stepping stone detection and emerging applications

    Get PDF
    Stepping Stone Detection (SSD) is conventionally intended for the detection of series of host computers used by attackers to hide their track in a network or host environment.This paper discusses the evolution of SSD and potential applications in other emerging fields. Novel, unique SSD models will be presented for spam, backdoor and proxy detections and expressed mathematically. These preliminary models have promising solutions for addressing current problems in these areas and may be expanded on in the future

    Pengembangan Aplikasi Menghitung Spam Berbasis Tag pada Situs Social Bookmarking Menggunakan Metode Spam Factor: Studi Kasus del.icio.us

    Get PDF
    Social Bookmarking merupakan salah satu jenis social tagging yang digunakan untuk mengkategorisasikan sebuah tautan web atau URL. Dalam situs web social bookmarking seperti del.icio.us terdapat banyak tag yang digunakan oleh user untuk mengkategorisasikan atau merepresentasikan sebuah situs blog, URL dan tautan. Pada tugas akhir ini, sebuah tag mengandung spam apabila tag tersebut digunakan pada bookmark tetapi tidak mendeskripsikan konten / situs web bookmark tersebut. Ketika terjadi spam maka dapat terjadi ambiguitas karena penggunaan tag dalam sebuah bookmark yang tidak merepresentasikan bookmark tersebut. Dalam tugas akhir ini diimplementasikan metode spam factor untuk menghitung spam dari sebuah tag yang direpresentasikan dalam sebuah nilai. Untuk implementasi metode spam factor, maka dibentuk terlebih dahulu daftar tag yang benar untuk setiap bookmark, membentuk posting random good user dan posting random bad user, mengimplementasikan trusted moderator untuk mendeteksi dan menghilangkan posting yang mengandung spam, dan mengurutkan pemakaian tag dan bookmark dengan occurrence-based search. Metode spam factor menghitung nilai spam sebuah tag dari jumlah dan kesesuaian penggunaan tag tersebut dengan bookmark pada setiap posting. Nilai yang dihasilkan dari spam factor berkisar 0-1 dimana semakin besar nilai yang didapatkan, maka semakin tinggi tag tersebut terindikasi spam. Dari penelitian ini dapat disimpulkan bahwa nilai spam factor yang dihasilkan lebih baik ketika menggunakan trusted moderator karena menghilangkan spam dari sistem
    corecore