Search CORE

6,333 research outputs found

Active Multi-Field Learning for Spam Filtering

Author: Liu Wuying
Wang Lin
Xie Nan
Yi Mianzhu
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 11/02/2015
Field of study

Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Using online linear classifiers to filter spam Emails

Author: Jones Gareth J.F.
Wang Bin
Wenfeng Pan
Publication venue: Springer Verlag
Publication date: 01/11/2007
Field of study

The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

Irish Universities

DCU Online Research Access Service

Hikester - the event management application

Author: Khatipov Rinat
Mazzara Manuel
Negimatzhanov Aydar
Rivera Victor
Zakirov Anvar
Zamaleev Ilgiz
Publication venue
Publication date: 19/01/2018
Field of study

Today social networks and services are one of the most important part of our everyday life. Most of the daily activities, such as communicating with friends, reading news or dating is usually done using social networks. However, there are activities for which social networks do not yet provide adequate support. This paper focuses on event management and introduces "Hikester". The main objective of this service is to provide users with the possibility to create any event they desire and to invite other users. "Hikester" supports the creation and management of events like attendance of football matches, quest rooms, shared train rides or visit of museums in foreign countries. Here we discuss the project architecture as well as the detailed implementation of the system components: the recommender system, the spam recognition service and the parameters optimizer

arXiv.org e-Print Archive

Crossref

BlogForever D2.4: Weblog spider prototype and associated methodology

Author: Banos V.
Gulliksen M.
Joy M.
Manolopoulos I.
Rynning M.
Stepanyan K.
Tselepidis I.
Publication venue
Publication date: 25/10/2013
Field of study

The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

ZENODO

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

Author: Biggio Battista
Roli Fabio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova