6,333 research outputs found

    Active Multi-Field Learning for Spam Filtering

    Get PDF
    Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    Hikester - the event management application

    Full text link
    Today social networks and services are one of the most important part of our everyday life. Most of the daily activities, such as communicating with friends, reading news or dating is usually done using social networks. However, there are activities for which social networks do not yet provide adequate support. This paper focuses on event management and introduces "Hikester". The main objective of this service is to provide users with the possibility to create any event they desire and to invite other users. "Hikester" supports the creation and management of events like attendance of football matches, quest rooms, shared train rides or visit of museums in foreign countries. Here we discuss the project architecture as well as the detailed implementation of the system components: the recommender system, the spam recognition service and the parameters optimizer

    BlogForever D2.4: Weblog spider prototype and associated methodology

    Get PDF
    The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

    Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

    Get PDF
    Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201
    • 

    corecore