22,992 research outputs found
Sub-Classifier Construction for Error Correcting Output Code Using Minimum Weight Perfect Matching
Multi-class classification is mandatory for real world problems and one of
promising techniques for multi-class classification is Error Correcting Output
Code. We propose a method for constructing the Error Correcting Output Code to
obtain the suitable combination of positive and negative classes encoded to
represent binary classifiers. The minimum weight perfect matching algorithm is
applied to find the optimal pairs of subset of classes by using the
generalization performance as a weighting criterion. Based on our method, each
subset of classes with positive and negative labels is appropriately combined
for learning the binary classifiers. Experimental results show that our
technique gives significantly higher performance compared to traditional
methods including the dense random code and the sparse random code both in
terms of accuracy and classification times. Moreover, our method requires
significantly smaller number of binary classifiers while maintaining accuracy
compared to the One-Versus-One.Comment: 7 pages, 3 figure
Feature and Variable Selection in Classification
The amount of information in the form of features and variables avail- able
to machine learning algorithms is ever increasing. This can lead to classifiers
that are prone to overfitting in high dimensions, high di- mensional models do
not lend themselves to interpretable results, and the CPU and memory resources
necessary to run on high-dimensional datasets severly limit the applications of
the approaches. Variable and feature selection aim to remedy this by finding a
subset of features that in some way captures the information provided best. In
this paper we present the general methodology and highlight some specific
approaches.Comment: Part of master seminar in document analysis held by Marcus
Eichenberger-Liwick
Feature Selection via Coalitional Game Theory
We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets
- ā¦