Direct Nonparametric Predictive Inference Classification Trees

Abstract

Classification is the task of assigning a new instance to one of a set of predefined categories based on the attributes of the instance. A classification tree is one of the most commonly used techniques in the area of classification. In recent years, many statistical methodologies have been developed to make inferences using imprecise probability theory, one of which is nonparametric predictive inference (NPI). NPI has been developed for different types of data and has been successfully applied to several fields, including classification. Due to the predictive nature of NPI, it is well suited for classification, as the nature of classification is explicitly predictive as well. In this thesis, we introduce a novel classification tree algorithm which we call the Direct Nonparametric Predictive Inference (D-NPI) classification algorithm. The D-NPI algorithm is completely based on the NPI approach, and it does not use any other assumptions. As a first step for developing the D-NPI classification method, we restrict our focus to binary and multinomial data types. The D-NPI algorithm uses a new split criterion called Correct Indication (CI), which is completely based on NPI and does not use any additional concepts such as entropy. The CI reflects how informative attribute variables are, hence if the attribute variable is very informative, it gives high NPI lower and upper probabilities for CI. In addition, the CI reports the strength of the evidence that the attribute variables will indicate regarding the possible class state for future instances, based on the data. The performance of the D-NPI classification algorithm is compared against several classification algorithms from the literature, including some imprecise probability algorithms, using different evaluation measures. The experimental results indicate that the D-NPI classification algorithm performs well and tends to slightly outperform other classification algorithms. Finally, a study of the D-NPI classification tree algorithm with noisy data is presented. Noisy data are data that contain incorrect values for the attribute variables or class variable. The performance of the D-NPI classification tree algorithm with noisy data is studied and compared to other classification tree algorithms when different levels of random noise are added to the class variable or to attribute variables. The results indicate that the D-NPI classification algorithm performs well with class noise and slightly outperforms other classification algorithms, while there is no single classification algorithm that acts as the best performing algorithm with attribute noise

    Similar works