18,074 research outputs found
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
Many different machine learning algorithms exist; taking into account each
algorithm's hyperparameters, there is a staggeringly large number of possible
alternatives overall. We consider the problem of simultaneously selecting a
learning algorithm and setting its hyperparameters, going beyond previous work
that addresses these issues in isolation. We show that this problem can be
addressed by a fully automated approach, leveraging recent innovations in
Bayesian optimization. Specifically, we consider a wide range of feature
selection techniques (combining 3 search and 8 evaluator methods) and all
classification approaches implemented in WEKA, spanning 2 ensemble methods, 10
meta-methods, 27 base classifiers, and hyperparameter settings for each
classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup
09, variants of the MNIST dataset and CIFAR-10, we show classification
performance often much better than using standard selection/hyperparameter
optimization methods. We hope that our approach will help non-expert users to
more effectively identify machine learning algorithms and hyperparameter
settings appropriate to their applications, and hence to achieve improved
performance.Comment: 9 pages, 3 figure
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
- …