Support vector machines classification on class imbalanced data: a case study with real medical data

Abstract

Support vector machines (SVMs) constitute one of the most popular and powerful classification methods. However, SVMs can be limited in their performance on highly imbalanced datasets. A classifier which has been trained on an imbalanced dataset can produce a biased model towards the majority class and result in high misclassification rate for minority class. For many applications, especially for medical diagnosis, it is of high importance to accurately distinguish false negative from false positive results. The purpose of this study is to successfully evaluate the performance of a classifier, keeping the correct balance between sensitivity and specificity, in order to enable the success of trauma outcome prediction. We compare the standard (or classic) SVM (C SVM) with resampling methods and a cost sensitive method, called Two Cost SVM (TC SVM), which constitute widely accepted strategies for imbalanced datasets and the derived results were discussed in terms of the sensitivity analysis and receiver operating characteristic (ROC) curves

    Similar works

    Full text

    thumbnail-image

    Available Versions