21 research outputs found

    Simultaneous Feature Extraction and Selection using a Masking Genetic Algorithm

    Get PDF
    Statistical pattern recognition techniques classify objects in terms of a representative set of features. The selection of features to measure and include can have a significant effect on the cost and accuracy of an automated classifier. Our previous research has shown that a hybrid between a k-nearest-neighbors (knn) classifier and a genetic algorithm (GA) can reduce the size of the feature set used by a classifier, while simultaneously weighting the remaining features to allow greater classification accuracy. Here we describe an extension to this approach which further enhances feature selection through the simultaneous optimization of feature weights and selection of key features by including a masking vector on the GA chromosome. We present the results of our masking GA/knn feature selection method on two important problems from biochemistry and medicine: identification of functional water molecules bound to protein surfaces, and diagnosis of thyroid deficiency. By allowing the GA to explore the effect of eliminating a feature from the classification without losing weight knowledge learned about the feature, the masking GA/knn can efficiently examine noisy, complex, and high-dimensionality datasets to find combinations of features which classify the data more accurately. In both biomedical applications, this technique resulted in equivalent or better classification accuracy using fewer features.
    corecore