2,097 research outputs found

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Training Support Vector Machines Using Frank-Wolfe Optimization Methods

    Full text link
    Training a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions. By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function. In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems

    A MATLAB SMO implementation to train a SVM classifier: Application to multi-style license plate numbers recognition

    Get PDF
    This paper implements the Support Vector Machine (SVM) training procedure proposed by John Platt denominated Sequential Minimimal Optimization (SMO). The application of this system involves a multi-style license plate characters recognition identifying numbers from “0” to “9”. In order to be robust against license plates with different character/background colors, the characters (numbers) visual information is encoded using Histograms of Oriented Gradients (HOG). A reliability measure to validate the system outputs is also proposed. Several tests are performed to evaluate the sensitivity of the algorithm to different parameters and kernel functions.Fil: Negri, Pablo Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación En Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación En Ciencias de la Computacion; Argentin

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Inhibition in multiclass classification

    Get PDF
    The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems. These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches
    corecore