2,081 research outputs found

    A modified k-nearest neighbor classifier to deal with unbalanced classes

    Full text link
    We present in this paper a simple, yet valuable improvement to the traditional k-Nearest Neighbor (kNN) classifier. It aims at addressing the issue of unbalanced classes by maximizing the class-wise classification accuracy. The proposed classifier also gives the option of favoring a particular class through evaluating a small set of fuzzy rules. When tested on a number of UCI datasets, the proposed algorithm managed to achieve a uniformly good performance

    Coupling different methods for overcoming the class imbalance problem

    Get PDF
    Many classification problems must deal with imbalanced datasets where one class \u2013 the majority class \u2013 outnumbers the other classes. Standard classification methods do not provide accurate predictions in this setting since classification is generally biased towards the majority class. The minority classes are oftentimes the ones of interest (e.g., when they are associated with pathological conditions in patients), so methods for handling imbalanced datasets are critical. Using several different datasets, this paper evaluates the performance of state-of-the-art classification methods for handling the imbalance problem in both binary and multi-class datasets. Different strategies are considered, including the one-class and dimension reduction approaches, as well as their fusions. Moreover, some ensembles of classifiers are tested, in addition to stand-alone classifiers, to assess the effectiveness of ensembles in the presence of imbalance. Finally, a novel ensemble of ensembles is designed specifically to tackle the problem of class imbalance: the proposed ensemble does not need to be tuned separately for each dataset and outperforms all the other tested approaches. To validate our classifiers we resort to the KEEL-dataset repository, whose data partitions (training/test) are publicly available and have already been used in the open literature: as a consequence, it is possible to report a fair comparison among different approaches in the literature. Our best approach (MATLAB code and datasets not easily accessible elsewhere) will be available at https://www.dei.unipd.it/node/2357

    The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance

    Get PDF
    Class imbalance is a condition where the amount of data in the minority class is smaller than that of the majority class. The impact of the class imbalance in the dataset is the occurrence of minority class misclassification, so it can affect classification performance. Various approaches have been taken to deal with the problem of class imbalances such as the data level approach, algorithmic level approach, and cost-sensitive learning. At the data level, one of the methods used is to apply the sampling method. In this study, the ADASYN, SMOTE, and SMOTE-ENN sampling methods were used to deal with the problem of class imbalance combined with the AdaBoost, K-Nearest Neighbor, and Random Forest classification algorithms. The purpose of this study was to determine the effect of handling class imbalances on the dataset on classification performance. The tests were carried out on five datasets and based on the results of the classification the integration of the ADASYN and Random Forest methods gave better results compared to other model schemes. The criteria used to evaluate include accuracy, precision, true positive rate, true negative rate, and g-mean score. The results of the classification of the integration of the ADASYN and Random Forest methods gave 5% to 10% better than other models

    Oversampling technique in student performance classification from engineering course

    Get PDF
    The first year of an engineering student was important to take proper academic planning. All subjects in the first year were essential for an engineering basis. Student performance prediction helped academics improve their performance better. Students checked performance by themselves. If they were aware that their performance are low, then they could make some improvement for their better performance. This research focused on combining the oversampling minority class data with various kinds of classifier models. Oversampling techniques were SMOTE, Borderline-SMOTE, SVMSMOTE, and ADASYN and four classifiers were applied using MLP, gradient boosting, AdaBoost and random forest in this research. The results represented that Borderline-SMOTE gave the best result for minority class prediction with several classifiers

    Credit-Scoring Methods (in English)

    Get PDF
    The paper reviews the best-developed and most frequently applied methods of credit scoring employed by commercial banks when evaluating loan applications. The authors concentrate on retail loans – applied research in this segment is limited, though there has been a sharp increase in the volume of loans to retail clients in recent years. Logit analysis is identified as the most frequent credit-scoring method used by banks. However, other nonparametric methods are widespread in terms of pattern recognition. The methods reviewed have potential for application in post-transition countries.banking sector, credit scoring, discrimination analysis, pattern recognition, retail loans

    Good appearance and shape descriptors for object category recognition

    Get PDF
    In the problem of object category recognition, we have studied different families of descriptors exploiting RGB and 3D information. Furthermore, we have proven practically that 3D shape-based descriptors are more suitable for this type of recognition due to low shape intra-class variance, as opposed to image texture-based. In addition, we have also shown how an efficient Naive Bayes Nearest Neighbor (NBNN) classifier can scale to a large hierarchical RGB-D Object Dataset [2] and achieve, with a single descriptor type, an accuracy close to state-of-art learning based approaches using combined descriptors

    Machine Learning for Classification of Imbalanced Big Data

    Get PDF
    The problem of classification of imbalanced datasets is a critical one. With an increase in the number of application domains that rely on classification, extensive research has been carried out in this field; with focus directed towards the problem of poor classification accuracy. Of late, the rise in significance of Big Data has forced industries to search for better techniques to handle massive and unstructured datasets; this has led to a need for robust classification algorithms that deal with unbalanced Big Data. This paper surveys the current algorithms provided by Machine Learning for unbalanced dataset classification and considers their possible use for larger or unstructured datasets
    • …
    corecore