60,220 research outputs found

    Spike pattern recognition by supervised classification in low dimensional embedding space

    Get PDF
    © The Author(s) 2016. This article is published with open access at Springerlink.com under the terms of the Creative Commons Attribution License 4.0, (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.Epileptiform discharges in interictal electroencephalography (EEG) form the mainstay of epilepsy diagnosis and localization of seizure onset. Visual analysis is rater-dependent and time consuming, especially for long-term recordings, while computerized methods can provide efficiency in reviewing long EEG recordings. This paper presents a machine learning approach for automated detection of epileptiform discharges (spikes). The proposed method first detects spike patterns by calculating similarity to a coarse shape model of a spike waveform and then refines the results by identifying subtle differences between actual spikes and false detections. Pattern classification is performed using support vector machines in a low dimensional space on which the original waveforms are embedded by locality preserving projections. The automatic detection results are compared to experts’ manual annotations (101 spikes) on a whole-night sleep EEG recording. The high sensitivity (97 %) and the low false positive rate (0.1 min−1), calculated by intra-patient cross-validation, highlight the potential of the method for automated interictal EEG assessment.Peer reviewedFinal Published versio

    Support Vector Machine Algorithm for SMS Spam Classification in The Telecommunication Industry

    Get PDF
    In recent years, we have withnessed a dramatic increment volume in the number of mobile users grows in telecommunication industry. However, this leads to drastic increase to the number of spam SMS messages. Short Message Service (SMS) is considered one of the widely used communication in telecommunication service. In reality, most of the users ignore the spam because of the lower rate of SMS and limited amount of spam classification tools. In this paper, we propose a Support Vector Machine (SVM) algorithm for SMS Spam Classification. Support Vector Machine is considered as the one of the most effective for data mining techniques. The propose algorithm have been evaluated using public dataset from UCI machine learning repository. The performance achieved is compared with other three data mining techniques such as Naïve Bayes, Multinominal Naïve Bayes and K-Nearest Neighbor with the different number of K= 1,3 and 5. Based on the measuring factors like higher accuracy, less processing time, highest kappa statistics, low error and the lowest false positive instance, it’s been identified that Support Vector Machines (SVM) outperforms better than other classifiers and it is the most accurate classifier to detect and label the spam messages with an average an accuracy is 98.9%. Comparing both the error parameter overall, the highest error has been found on the algorithm KNN with K=3 and K=5. Whereas the model with less error is SVM followed by Multinominal Naïve Bayes. Therefore, this propose method can be used as a best baseline for further comparison based on SMS spam classification

    Detecting Selected Network Covert Channels Using Machine Learning

    Get PDF
    International audienceNetwork covert channels break a computer's security policy to establish a stealthy communication. They are a threat being increasingly used by malicious software. Most previous studies on detecting network covert channels using Machine Learning (ML) were tested with a dataset that was created using one single covert channel tool and also are ineffective at classifying covert channels into patterns. In this paper, selected ML methods are applied to detect popular network covert channels. The capacity of detecting and classifying covert channels with high precision is demonstrated. A dataset was created from nine standard covert channel tools and the covert channels are then accordingly classified into patterns and labelled. Half of the generated dataset is used to train three different ML algorithms. The remaining half is used to verify the algorithms' performance. The tested ML algorithms are Support Vector Machines (SVM), k-Nearest Neighbors (k-NN) and Deep Neural Networks (DNN). The k-NN model demonstrated the highest precision rate at 98% detection of a given covert channel and with a low false positive rate of 1%

    Fast Sequence Component Analysis for Attack Detection in Synchrophasor Networks

    Get PDF
    Modern power systems have begun integrating synchrophasor technologies into part of daily operations. Given the amount of solutions offered and the maturity rate of application development it is not a matter of "if" but a matter of "when" in regards to these technologies becoming ubiquitous in control centers around the world. While the benefits are numerous, the functionality of operator-level applications can easily be nullified by injection of deceptive data signals disguised as genuine measurements. Such deceptive action is a common precursor to nefarious, often malicious activity. A correlation coefficient characterization and machine learning methodology are proposed to detect and identify injection of spoofed data signals. The proposed method utilizes statistical relationships intrinsic to power system parameters, which are quantified and presented. Several spoofing schemes have been developed to qualitatively and quantitatively demonstrate detection capabilities.Comment: 8 pages, 4 figures, submitted to IEEE Transaction

    Automatic categorization of diverse experimental information in the bioscience literature

    Get PDF
    Background: Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. Results: We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Conclusions: Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Spatial support vector regression to detect silent errors in the exascale era

    Get PDF
    As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research Program, under Contract DE-AC02-06CH11357, by FI-DGR 2013 scholarship, by HiPEAC PhD Collaboration Grant, the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402, and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

    A CASE STUDY ON SUPPORT VECTOR MACHINES VERSUS ARTIFICIAL NEURAL NETWORKS

    Get PDF
    The capability of artificial neural networks for pattern recognition of real world problems is well known. In recent years, the support vector machine has been advocated for its structure risk minimization leading to tolerance margins of decision boundaries. Structures and performances of these pattern classifiers depend on the feature dimension and training data size. The objective of this research is to compare these pattern recognition systems based on a case study. The particular case considered is on classification of hypertensive and normotensive right ventricle (RV) shapes obtained from Magnetic Resonance Image (MRI) sequences. In this case, the feature dimension is reasonable, but the available training data set is small, however, the decision surface is highly nonlinear.For diagnosis of congenital heart defects, especially those associated with pressure and volume overload problems, a reliable pattern classifier for determining right ventricle function is needed. RV¡¦s global and regional surface to volume ratios are assessed from an individual¡¦s MRI heart images. These are used as features for pattern classifiers. We considered first two linear classification methods: the Fisher linear discriminant and the linear classifier trained by the Ho-Kayshap algorithm. When the data are not linearly separable, artificial neural networks with back-propagation training and radial basis function networks were then considered, providing nonlinear decision surfaces. Thirdly, a support vector machine was trained which gives tolerance margins on both sides of the decision surface. We have found in this case study that the back-propagation training of an artificial neural network depends heavily on the selection of initial weights, even though randomized. The support vector machine where radial basis function kernels are used is easily trained and provides decision tolerance margins, in spite of only small margins

    Prediction of protein-protein interactions using one-class classification methods and integrating diverse data

    Get PDF
    This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results. Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature

    Signal Detection for QPSK Based Cognitive Radio Systems using Support Vector Machines

    Get PDF
    Cognitive radio based network enables opportunistic dynamic spectrum access by sensing, adopting and utilizing the unused portion of licensed spectrum bands. Cognitive radio is intelligent enough to adapt the communication parameters of the unused licensed spectrum. Spectrum sensing is one of the most important tasks of the cognitive radio cycle. In this paper, the auto-correlation function kernel based Support Vector Machine (SVM) classifier along with Welch's Periodogram detector is successfully implemented for the detection of four QPSK (Quadrature Phase Shift Keying) based signals propagating through an AWGN (Additive White Gaussian Noise) channel. It is shown that the combination of statistical signal processing and machine learning concepts improve the spectrum sensing process and spectrum sensing is possible even at low Signal to Noise Ratio (SNR) values up to -50 dB
    corecore