23,552 research outputs found

    Using online linear classifiers to filter spam Emails

    Get PDF
    The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

    Ground Extraction from 3D Lidar Point Clouds

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works Pomares, A., Martínez, J.L., Mandow, A., Martínez, M.A., Morán, M., Morales, J. Ground extraction from 3D lidar point clouds with the Classification Learner App (2018) 26th Mediterranean Conference on Control and Automation, Zadar, Croatia, June 2018, pp.400-405. DOI: PendingGround extraction from three-dimensional (3D) range data is a relevant problem for outdoor navigation of unmanned ground vehicles. Even if this problem has received attention with specific heuristics and segmentation approaches, identification of ground and non-ground points can benefit from state-of-the-art classification methods, such as those included in the Matlab Classification Learner App. This paper proposes a comparative study of the machine learning methods included in this tool in terms of training times as well as in their predictive performance. With this purpose, we have combined three suitable features for ground detection, which has been applied to an urban dataset with several labeled 3D point clouds. Most of the analyzed techniques achieve good classification results, but only a few offer low training and prediction times.This work was partially supported by the Spanish project DPI 2015- 65186-R. The publication has received support from Universidad de Málaga, Campus de Excelencia Andalucía Tech

    Adaptive text mining: Inferring structure from sequences

    Get PDF
    Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively
    corecore