Support vector machines classification on class imbalanced data: a case study with real medical data

Drosou, K; Georgiou, S; Koukouvinos, C; Stylianou, S

Support vector machines classification on class imbalanced data: a case study with real medical data

Authors: K Drosou
S Georgiou
C Koukouvinos
S Stylianou
Publication date
Publisher: Columbia University * Department of Statistics (United States)

Abstract

Support vector machines (SVMs) constitute one of the most popular and powerful classification methods. However, SVMs can be limited in their performance on highly imbalanced datasets. A classifier which has been trained on an imbalanced dataset can produce a biased model towards the majority class and result in high misclassification rate for minority class. For many applications, especially for medical diagnosis, it is of high importance to accurately distinguish false negative from false positive results. The purpose of this study is to successfully evaluate the performance of a classifier, keeping the correct balance between sensitivity and specificity, in order to enable the success of trauma outcome prediction. We compare the standard (or classic) SVM (C SVM) with resampling methods and a cost sensitive method, called Two Cost SVM (TC SVM), which constitute widely accepted strategies for imbalanced datasets and the derived results were discussed in terms of the sensitivity analysis and receiver operating characteristic (ROC) curves

Similar works

Full text

Available Versions

RMIT Research Repository

oai:researchbank.rmit.edu.au:r...

Last time updated on 04/05/2016