75,756 research outputs found

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Best Practices in Convolutional Networks for Forward-Looking Sonar Image Recognition

    Full text link
    Convolutional Neural Networks (CNN) have revolutionized perception for color images, and their application to sonar images has also obtained good results. But in general CNNs are difficult to train without a large dataset, need manual tuning of a considerable number of hyperparameters, and require many careful decisions by a designer. In this work, we evaluate three common decisions that need to be made by a CNN designer, namely the performance of transfer learning, the effect of object/image size and the relation between training set size. We evaluate three CNN models, namely one based on LeNet, and two based on the Fire module from SqueezeNet. Our findings are: Transfer learning with an SVM works very well, even when the train and transfer sets have no classes in common, and high classification performance can be obtained even when the target dataset is small. The ADAM optimizer combined with Batch Normalization can make a high accuracy CNN classifier, even with small image sizes (16 pixels). At least 50 samples per class are required to obtain 90%90\% test accuracy, and using Dropout with a small dataset helps improve performance, but Batch Normalization is better when a large dataset is available.Comment: Author version; IEEE/MTS Oceans 2017 Aberdee

    Classification of time series by shapelet transformation

    Get PDF
    Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A \emph{shapelet} is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best kk shapelets, which are used to produce a transformed dataset, where each of the kk features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselve

    Data mining based cyber-attack detection

    Get PDF

    Accelerating Deep Learning with Shrinkage and Recall

    Full text link
    Deep Learning is a very powerful machine learning model. Deep Learning trains a large number of parameters for multiple layers and is very slow when data is in large scale and the architecture size is large. Inspired from the shrinking technique used in accelerating computation of Support Vector Machines (SVM) algorithm and screening technique used in LASSO, we propose a shrinking Deep Learning with recall (sDLr) approach to speed up deep learning computation. We experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network (DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data sets. Results show that the speedup using shrinking Deep Learning with recall (sDLr) can reach more than 2.0 while still giving competitive classification performance.Comment: The 22nd IEEE International Conference on Parallel and Distributed Systems (ICPADS 2016
    corecore