6 research outputs found

    A Fuzzy k-Nearest Neighbors Classifier to Deal with Imperfect Data

    Get PDF
    © 2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in Soft Computing. To access the final edited and published work see https://doi.org/10.1007/s00500-017-2567-xThe k-nearest neighbors method (kNN) is a nonparametric, instance-based method used for regression and classification. To classify a new instance, the kNN method computes its k nearest neighbors and generates a class value from them. Usually, this method requires that the information available in the datasets be precise and accurate, except for the existence of missing values. However, data imperfection is inevitable when dealing with real-world scenarios. In this paper, we present the kNNimp classifier, a k-nearest neighbors method to perform classification from datasets with imperfect value. The importance of each neighbor in the output decision is based on relative distance and its degree of imperfection. Furthermore, by using external parameters, the classifier enables us to define the maximum allowed imperfection, and to decide if the final output could be derived solely from the greatest weight class (the best class) or from the best class and a weighted combination of the closest classes to the best one. To test the proposed method, we performed several experiments with both synthetic and realworld datasets with imperfect data. The results, validated through statistical tests, show that the kNNimp classifier is robust when working with imperfect data and maintains a good performance when compared with other methods in the literature, applied to datasets with or without imperfection

    Statistical Theory for Imbalanced Binary Classification

    Full text link
    Within the vast body of statistical theory developed for binary classification, few meaningful results exist for imbalanced classification, in which data are dominated by samples from one of the two classes. Existing theory faces at least two main challenges. First, meaningful results must consider more complex performance measures than classification accuracy. To address this, we characterize a novel generalization of the Bayes-optimal classifier to any performance metric computed from the confusion matrix, and we use this to show how relative performance guarantees can be obtained in terms of the error of estimating the class probability function under uniform (L∞\mathcal{L}_\infty) loss. Second, as we show, optimal classification performance depends on certain properties of class imbalance that have not previously been formalized. Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance. We analyze how Uniform Class Imbalance influences optimal classifier performance and show that it necessitates different classifier behavior than other types of class imbalance. We further illustrate these two contributions in the case of kk-nearest neighbor classification, for which we develop novel guarantees. Together, these results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.Comment: Parts of this paper have been revised from arXiv:2004.04715v2 [math.ST

    Weighted Data Gravitation Classification for Standard and Imbalanced Data

    No full text
    corecore