112,061 research outputs found

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    How to Explain Individual Classification Decisions

    Full text link
    After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.Comment: 31 pages, 14 figure

    A parallel and distributed genetic-based learning classifier system with application in human electroencephalographic signal classification

    Full text link
    University of Technology, Sydney. Faculty of Engineering.Genetic-based Learning Classifier Systems have been proposed as a competent technology for the classification of medical data sets. What is not known about this class of system is twofold. Firstly, how does a Learning Classifier System (LCS) perform when applied to the single-step classification of multiple-channel, noisy, artefact-inclusive human EEG signals acquired from many participants? Secondly and more importantly, is how the learning classifier system performs when incorporated with migration strategies, inspired by multi- deme, coarse-grained Parallel Genetic Algorithms (PGA) to provide parallel and distributed classifier migration? This research investigates these open questions and concludes, subject to the considerations herein, that these technological approaches can provide competitive classification performance for such applications. We performed a preliminary examination and implementation of a parallel genetic algorithm and hybrid local search PGA using experimental methods. The parallelisation and incorporation of classical local search methods into a genetic algorithm are well known methods for increasing performance and we examine this. Furthermore, inspired by the significant improvements in convergence velocity and solution quality provided by the multi- deme, coarse-grained Parallel Genetic Algorithm, we incorporate the method into a learning classifier system with the aim of providing parallel and distributed classifier migration. As a result, a unique learning classifier system (pXCS) is proposed that improves classification accuracy, achieves increased learning rates and significantly reduces the classifier population during learning. It is compared to the extended learning Classifier System (XCS) and several state of the art non-evolutionary classifiers in the single-step classification of noisy, artefact- inclusive human EEG signals, derived from mental task experiments conducted using ten human participants. We also conclude that establishing an appropriate migration strategy is an important cause of pXCS learning and classification performance. However, an inappropriate migration rate, frequency or selection:replacement scheme can reduce performance and we document the factors associated with this. Furthermore, we conclude that both EEG segment size and representation both have a significant influence on classification performance. In effect, determining an appropriate representation of the raw EEG signal is tantamount to the classification method itself. This research allows us to further explore and incorporate pXCS evolved classifiers derived from multi-channel human EEG signals as an interface in the control of a device such as a powered wheelchair or brain-computer interface (BCI) applications

    Machine Learning and Threat Intelligence

    Get PDF
    Machine learning plays a role in a wide variety of fields. It can be used for predicting stock prices, identifying diseases, and even teaching Mario how to avoid mushrooms. This project explores the use of machine learning in realm of threat intelligence. There are many sources for professionals to keep up to date on the latest threats to software (NVD, PacketStorm, Twitter, etc.). However, it can become over cumbersome for individuals to monitor all of these sources manually. Building an automated string match system is a good first step to tackle this problem, but many false positives may be returned. A good way to limit this issue is to use machine learning and train a classifier to identify what information is relevant and what is irrelevant. This paper explores 3 different algorithms for building a text classifier and conducts tests to see which is the most accurate at identifying threats

    Machine Learning and Threat Intelligence

    Get PDF
    Machine learning plays a role in a wide variety of fields. It can be used for predicting stock prices, identifying diseases, and even teaching Mario how to avoid mushrooms. This project explores the use of machine learning in realm of threat intelligence. There are many sources for professionals to keep up to date on the latest threats to software (NVD, PacketStorm, Twitter, etc.). However, it can become over cumbersome for individuals to monitor all of these sources manually. Building an automated string match system is a good first step to tackle this problem, but many false positives may be returned. A good way to limit this issue is to use machine learning and train a classifier to identify what information is relevant and what is irrelevant. This paper explores 3 different algorithms for building a text classifier and conducts tests to see which is the most accurate at identifying threats
    corecore