14,151 research outputs found

    Cholesky-factorized sparse Kernel in support vector machines

    Get PDF
    Support Vector Machine (SVM) is one of the most powerful machine learning algorithms due to its convex optimization formulation and handling non-linear classification. However, one of its main drawbacks is the long time it takes to train large data sets. This limitation is often aroused when applying non-linear kernels (e.g. RBF Kernel) which are usually required to obtain better separation for linearly inseparable data sets. In this thesis, we study an approach that aims to speed-up the training time by combining both the better performance of RBF kernels and fast training by a linear solver, LIBLINEAR. The approach uses an RBF kernel with a sparse matrix which is factorized using Cholesky decomposition. The method is tested on large artificial and real data sets and compared to the standard RBF and linear kernels where both the accuracy and training time are reported. For most data sets, the result shows a huge training time reduction, over 90\%, whilst maintaining the accuracy

    Fast algorithms for large scale generalized distance weighted discrimination

    Full text link
    High dimension low sample size statistical analysis is important in a wide range of applications. In such situations, the highly appealing discrimination method, support vector machine, can be improved to alleviate data piling at the margin. This leads naturally to the development of distance weighted discrimination (DWD), which can be modeled as a second-order cone programming problem and solved by interior-point methods when the scale (in sample size and feature dimension) of the data is moderate. Here, we design a scalable and robust algorithm for solving large scale generalized DWD problems. Numerical experiments on real data sets from the UCI repository demonstrate that our algorithm is highly efficient in solving large scale problems, and sometimes even more efficient than the highly optimized LIBLINEAR and LIBSVM for solving the corresponding SVM problems

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Classification of sporting activities using smartphone accelerometers

    Get PDF
    In this paper we present a framework that allows for the automatic identification of sporting activities using commonly available smartphones. We extract discriminative informational features from smartphone accelerometers using the Discrete Wavelet Transform (DWT). Despite the poor quality of their accelerometers, smartphones were used as capture devices due to their prevalence in today’s society. Successful classification on this basis potentially makes the technology accessible to both elite and non-elite athletes. Extracted features are used to train different categories of classifiers. No one classifier family has a reportable direct advantage in activity classification problems to date; thus we examine classifiers from each of the most widely used classifier families. We investigate three classification approaches; a commonly used SVM-based approach, an optimized classification model and a fusion of classifiers. We also investigate the effect of changing several of the DWT input parameters, including mother wavelets, window lengths and DWT decomposition levels. During the course of this work we created a challenging sports activity analysis dataset, comprised of soccer and field-hockey activities. The average maximum F-measure accuracy of 87% was achieved using a fusion of classifiers, which was 6% better than a single classifier model and 23% better than a standard SVM approach
    corecore