Machine Learning Shrewd Approach For An Imbalanced Dataset Conversion Samples

Abstract

The imbalance data applies to at least one of the classes, which are typically exceeded by the other ones. The Machine Learning Algorithm (Classifier) trained with an imbalance dataset predicts the majority class (frequently occurring) ‎more than the other minority classes (rarely occurring). Training with an imbalance dataset poses challenges for classifiers; ‎however, applying suitable techniques for reducing class imbalance issues can enhance the classifier’s performance. We take an ‎imbalanced dataset from an educational context. Initially, all shortcomings regarding classification of imbalanced dataset have ‎been examined. After that, we apply data-level algorithms for class balancing and compare the performance of classifiers. The ‎performance of the classifier is measured using the underlying information in their confusion matrices such as accuracy, ‎precision, recall, and f-measure. It shows that classification with an imbalance dataset may produce higher accuracy but low ‎precision and recall for the minority class. The analysis confirms that both undersampling and oversampling are effective for ‎balancing datasets, however, oversampling dominates.

    Similar works