2 research outputs found
Dataset analysis for classifier ensemble enhancement
We developed three different methods for dataset analysis and ensemble enhance-
ment. They share the underlying idea that an accurate preprocessing and adap-
tation of the data can improve the system performance, without changing the
classification model. Correlation Score is a generic framework for assessing encoding
techniques by measuring the correlation between the encoded feature vectors and
the corresponding class labels; experiments show its effectiveness in discovering the
best encoding configurations between those tested, on a wide range of classification
domains. Multi-Resolution Complexity Analysis is a method for assessing the
local complexity inside a given domain. It is able to split a domain into regions
of different classification complexity, giving insights on the inner structure of the
populations inside the domain. Finally, Forests of Local Trees are a novel training
algorithm for ensemble classifiers. They are based on the concept of local trees:
classifiers trained with a bias toward a certain region of the domain. This bias
enhances the diversity inside the ensemble, leading to improved performance.
These three topics are meant as a foundation for a more complex framework, that
will eventually utilize them organically