3 research outputs found

    A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems

    No full text
    Nowadays, most real-world datasets suffer from the problem of imbalanced distribution of data samples in classes, especially when the number of data representing the larger class (majority) is much greater than that of the smaller class (minority). In order to solve this problem, various types of undersampling or oversampling techniques have been proposed to create a dataset with equal number of samples in each class by reducing or increasing the number of samples in majority or minority classes, respectively. Ensemble classifiers use multiple learning algorithms to enhance the accuracy of classification. Based on the results, combining undersampling or oversampling methods with ensemble classifiers can result in models with better performance. By using both clustering and new undersampling methods, the present study aimed to propose a novel clustering-based undersampling method to create a balanced dataset. This method uses k-means clustering algorithm for clustering the data, Mahalanobis distance to analyze samples distance in each cluster to centroid, and a selection method that preserves the pattern of data distribution in each cluster. Regarding the experimental results obtained by 44 benchmark datasets from KEEL repository, the proposed approach performed better than that of seven state-of-the-art approaches
    corecore