6,535 research outputs found
Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification
We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
The Ultimate Fate of Supercooled Liquids
In recent years it has become widely accepted that a dynamical length scale
{\xi}_{\alpha} plays an important role in supercooled liquids near the glass
transition. We examine the implications of the interplay between the growing
{\xi}_{\alpha} and the size of the crystal nucleus, {\xi}_M, which shrinks on
cooling. We argue that at low temperatures where {\xi}_{\alpha} > {\xi}_M a new
crystallization mechanism emerges enabling rapid development of a large scale
web of sparsely connected crystallinity. Though we predict this web percolates
the system at too low a temperature to be easily seen in the laboratory, there
are noticeable residual effects near the glass transition that can account for
several previously observed unexplained phenomena of deeply supercooled liquids
including Fischer clusters, and anomalous crystal growth near T_g
MEG: Multi-objective Ensemble Generation for Software Defect Prediction
Background: Defect Prediction research aims at assisting software
engineers in the early identification of software defect during the
development process. A variety of automated approaches, ranging from traditional classification models to more sophisticated
learning approaches, have been explored to this end. Among these,
recent studies have proposed the use of ensemble prediction models
(i.e., aggregation of multiple base classifiers) to build more robust
defect prediction models. /
Aims: In this paper, we introduce a novel
approach based on multi-objective evolutionary search to automatically generate defect prediction ensembles. Our proposal is not
only novel with respect to the more general area of evolutionary
generation of ensembles, but it also advances the state-of-the-art
in the use of ensemble in defect prediction. /
Method: We assess
the effectiveness of our approach, dubbed as Multi-objective
Ensemble Generation (MEG), by empirically benchmarking it
with respect to the most related proposals we found in the literature
on defect prediction ensembles and on multi-objective evolutionary
ensembles (which, to the best of our knowledge, had never been
previously applied to tackle defect prediction). /
Result: Our results
show that MEG is able to generate ensembles which produce similar
or more accurate predictions than those achieved by all the other
approaches considered in 73% of the cases (with favourable large
effect sizes in 80% of them). /
Conclusions: MEG is not only able
to generate ensembles that yield more accurate defect predictions
with respect to the benchmarks considered, but it also does it automatically, thus relieving the engineers from the burden of manual
design and experimentation
- …