An Oversampling Mechanism for Multimajority Datasets using SMOTE and Darwinian Particle Swarm Optimisation

Abstract

Data skewness continues to be one of the leading factors which adversely impacts the machine learning algorithms performance. An approach to reduce this negative effect of the data variance is to pre-process the former dataset with data level resampling strategies. Resampling strategies have been seen in two forms, oversampling and undersampling. An oversampling strategy is proposed in this article for tackling multiclass imbalanced datasets. This proposed approach optimises the state-of-the-art oversampling technique SMOTE with the Darwinian Particle Swarm Optimization technique. This proposed method DOSMOTE generates synthetic optimised samples for balancing the datasets. This strategy will be more effective on multimajority datasets.  An experimental study is performed on peculiar multimajority datasets to measure the effectiveness of the proposed approach. As a result, the proposed method produces promising results when compared to the conventional oversampling strategies

    Similar works