5 research outputs found

    Ensemble Methods for Malware Diagnosis Based on One-class SVMs

    Get PDF
    Malware diagnosis is one of today’s most popular topics of machine learning. Instead of simply applying all the classical classification algorithms to the problem and claim the highest accuracy as the result of prediction, which is the typical approach adopted by studies of this kind, we stick to the Support Vector Machine (SVM) classifier and based on our observation of some principles of learning, characteristics of statistics and the behavior of SVM, we employed a number of the potential preprocessing or ensemble methods including rescaling, bagging and clustering that may enhance the performance to the classical algorithm. We implemented the idea of rescaling by iteratively magnifying the attributes used by the support vectors of SVM and eliminating those unused ones from the training data examples until a maximum accuracy is achieved. Our study of bagging and clustering focused on the situation where only examples of malware are available and one-class SVM is used. For both methods, a group of models is built using part of the training data instead of building one model with the whole training data set. We also compared the effect of two possible coordination approaches for the sub-models acquired in the training process, namely, voting and one positive to be positive. Results of experiments showed that when utilized together with appropriate coordination methods, ensemble methods can effectively decrease both the cases where malware is labeled as clean or clean software is classified as malware, which are formally known as false-negative and false-positive errors in our context respectively

    A Primer on Kernel Methods

    Get PDF

    Detecting Executive Function Subtypes in Individuals with Schizophrenia and Healthy Controls

    Get PDF
    Executive functioning (EF) impairments observed in schizophrenia (SZ) occur prior to onset of psychosis and are predictive of functional outcomes. There is significant variability in the nature and severity of EF deficits, however, and a better understanding of this heterogeneity could provide insight into the neurodevelopmental processes underlying both SZ and EFs. Using an approach similar to Fair et al., 2012, the present analysis examined heterogeneity in EFs and attempted to identify EF subtypes within healthy controls (HC) and individuals with SZ. EFs were assessed using the Trail Making Test, Verbal Fluency test, Tower of London, and Continuous Performance Test. A 4-factor model of EF (fluency, planning, shifting, attention) was tested in the sample using a Confirmatory Factor Analysis (CFA). The presence of EF subtypes was assessed separately in both groups using community detection (CD), an analytic technique based in graph theory that enables an unbiased analysis of community structure within complex networks. Results from the CFA supported a 4-factor model of EF. The CD analyses indicated greater modularity in SZ, and upon initial inspection, identified 7 EF subtypes in the SZ group that nested within 5 EF subtypes in the HC group. The impact of EF profiles on diagnostic accuracy was assessed using a machine learning approach. Results revealed improved diagnostic accuracy for a majority of the EF subtypes when EF profile was considered. Consistent with findings reported by Fair and colleagues, results support the existence of similar cognitive subtypes in the context of both normal and aberrant neurodevelopment

    Support Vector Machines : Backgrounds and Practice

    Get PDF

    Leave-One-Out Support Vector Machines

    No full text
    We present a new learning algorithm for pattern recognition inspired by a recent upper bound on leave--one--out error [ Jaakkola and Haussler, 1999 ] proved for Support Vector Machines (SVMs) [ Vapnik, 1995; 1998 ] . The new approach directly minimizes the expression given by the bound in an attempt to minimize leave--one--out error. This gives a convex optimization problem which constructs a sparse linear classifier in feature space using the kernel technique. As such the algorithm possesses many of the same properties as SVMs. The main novelty of the algorithm is that apart from the choice of kernel, it is parameterless -- the selection of the number of training errors is inherent in the algorithm and not chosen by an extra free parameter as in SVMs. First experiments using the method on benchmark datasets from the UCI repository show results similar to SVMs which have been tuned to have the best choice of parameter. 1 Introduction Support Vector Machines (SVMs), motivated by minim..
    corecore