337,268 research outputs found

    CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

    Full text link
    Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201

    Improving specific class mapping from remotely sensed data by cost-sensitive learning

    Get PDF
    In many remote-sensing projects, one is usually interested in a small number of land-cover classes present in a study area and not in all the land-cover classes that make-up the landscape. Previous studies in supervised classification of satellite images have tackled specific class mapping problem by isolating the classes of interest and combining all other classes into one large class, usually called others, and by developing a binary classifier to discriminate the class of interest from the others. Here, this approach is called focused approach. The strength of the focused approach is to decompose the original multi-class supervised classification problem into a binary classification problem, focusing the process on the discrimination of the class of interest. Previous studies have shown that this method is able to discriminate more accurately the classes of interest when compared with the standard multi-class supervised approach. However, it may be susceptible to data imbalance problems present in the training data set, since the classes of interest are often a small part of the training set. A result the classification may be biased towards the largest classes and, thus, be sub-optimal for the discrimination of the classes of interest. This study presents a way to minimize the effects of data imbalance problems in specific class mapping using cost-sensitive learning. In this approach errors committed in the minority class are treated as being costlier than errors committed in the majority class. Cost-sensitive approaches are typically implemented by weighting training data points accordingly to their importance to the analysis. By changing the weight of individual data points, it is possible to shift the weight from the larger classes to the smaller ones, balancing the data set. To illustrate the use of the cost-sensitive approach to map specific classes of interest, a series of experiments with weighted support vector machines classifier and Landsat Thematic Mapper data were conducted to discriminate two types of mangrove forest (high-mangrove and low-mangrove) in Saloum estuary, Senegal, a United Nations Educational, Scientific and Cultural Organisation World Heritage site. Results suggest an increase in overall classification accuracy with the use of cost-sensitive method (97.3%) over the standard multi-class (94.3%) and the focused approach (91.0%). In particular, cost-sensitive method yielded higher sensitivity and specificity values on the discrimination of the classes of interest when compared with the standard multi-class and focused approaches

    Multi-class Cervical Cancer Classification using Transfer Learning-based Optimized SE-ResNet152 model in Pap Smear Whole Slide Images

    Get PDF
    Among the main factors contributing to death globally is cervical cancer, regardless of whether it can be avoided and treated if the afflicted tissues are removed early. Cervical screening programs must be made accessible to everyone and effectively, which is a difficult task that necessitates, among other things, identifying the population\u27s most vulnerable members. Therefore, we present an effective deep-learning method for classifying the multi-class cervical cancer disease using Pap smear images in this research. The transfer learning-based optimized SE-ResNet152 model is used for effective multi-class Pap smear image classification. The reliable significant image features are accurately extracted by the proposed network model. The network\u27s hyper-parameters are optimized using the Deer Hunting Optimization (DHO) algorithm. Five SIPaKMeD dataset categories and six CRIC dataset categories constitute the 11 classes for cervical cancer diseases. A Pap smear image dataset with 8838 images and various class distributions is used to evaluate the proposed method. The introduction of the cost-sensitive loss function throughout the classifier\u27s learning process rectifies the dataset\u27s imbalance. When compared to prior existing approaches on multi-class Pap smear image classification, 99.68% accuracy, 98.82% precision, 97.86% recall, and 98.64% F1-Score are achieved by the proposed method on the test set. For automated preliminary diagnosis of cervical cancer diseases, the proposed method produces better identification results in hospitals and cervical cancer clinics due to the positive classification results

    Automatic Multi-Label ECG Classification with Category Imbalance and Cost-Sensitive Thresholding

    Get PDF
    From MDPI via Jisc Publications RouterHistory: accepted 2021-11-12, pub-electronic 2021-11-14Publication status: PublishedFunder: Collaborative Innovation Center for Prevention and Treatment of Cardiovascular Disease of Si-chuan Province (CICPTCDSP); Grant(s): xtcx2019-01Automatic electrocardiogram (ECG) classification is a promising technology for the early screening and follow-up management of cardiovascular diseases. It is, by nature, a multi-label classification task owing to the coexistence of different kinds of diseases, and is challenging due to the large number of possible label combinations and the imbalance among categories. Furthermore, the task of multi-label ECG classification is cost-sensitive, a fact that has usually been ignored in previous studies on the development of the model. To address these problems, in this work, we propose a novel deep learning model–based learning framework and a thresholding method, namely category imbalance and cost-sensitive thresholding (CICST), to incorporate prior knowledge about classification costs and the characteristic of category imbalance in designing a multi-label ECG classifier. The learning framework combines a residual convolutional network with a class-wise attention mechanism. We evaluate our method with a cost-sensitive metric on multiple realistic datasets. The results show that CICST achieved a cost-sensitive metric score of 0.641 ± 0.009 in a 5-fold cross-validation, outperforming other commonly used thresholding methods, including rank-based thresholding, proportion-based thresholding, and fixed thresholding. This demonstrates that, by taking into account the category imbalance and predefined cost information, our approach is effective in improving the performance and practicability of multi-label ECG classification models
    corecore