21 research outputs found

    Learning k-Nearest Neighbor Naive Bayes For Ranking ⋆

    No full text
    Abstract. Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an effective classification model in decades. However, its performance in ranking is unknown. In this paper, we conduct a systematic study on the ranking performance of KNN. At first, we compare KNN and KNNDW (KNN with distance weighted) to decision trees and naive Bayes in ranking, measured by AUC (the area under the Receiver Operating Characteristics curve). Then, we propose to improve the ranking performance of KNN by combining KNN with naive Bayes (simply NB). The idea is that a naive Bayes is learned using the k nearest neighbors of the test instance as the training data and used to classify the test instance. A critical problem in combining KNN with naive Bayes is the lack of training data when k is small. We propose to deal with it using cloning to expand the training data. That is, each of the k nearest neighbors is “cloned ” and the clones are added to the training data. We call our new model instance cloning local naive Bayes (simply ICLNB). We conduct extensive empirical comparison for the related algorithms in two groups in terms of AUC, using the 36 UCI datasets recommended by Weka[2]. In the first group, we compare ICLNB with KNN, NB, NBTree[3], C4.4[4]. In the second group, we compare ICLNB with KNN, KNNDW and LWNB[5]. Our experimental results show that ICLNB outperforms all those algorithms significantly. From our study, we have two conclusions. First, KNN-relates algorithms performs well in ranking. Second, our new algorithm ICLNB performs best among the algorithms compared in this paper, and could be used in the applications in which an accurate ranking is desired.

    Augmenting naive bayes for ranking

    No full text
    Naive Bayes is an effective and efficient learning algorithm in classification. In many applications, however, an accurate ranking of instances based on the class probability is more desirable. Unfortunately, naive Bayes has been found to produce poor probability estimates. Numerous techniques have been proposed to extend naive Bayes for better classification accuracy, of which selective Bayesian classifiers (SBC) (Langley & Sage

    One Dependence Augmented Naive Bayes

    No full text
    Abstract. In real-world data mining applications, an accurate ranking is same important to a accurate classification. Naive Bayes (simply NB) has been widely used in data mining as a simple and effective classification and ranking algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve Naive Bayes, for example, SBC[1] and TAN[2]. Indeed, the experimental results show that SBC and TAN achieve a significant improvement in term of classification accuracy. However, unfortunately, our experiments also show that SBC and TAN perform even worse than naive Bayes in ranking measured by AUC[3, 4](the area under the Receiver Operating Characteristics curve). This fact raises the question of whether can we improve Naive Bayes with both accurate classification and ranking? In this paper, responding to this question, we present a new learning algorithm called One Dependence Augmented Naive Bayes (simply ODANB). Our motivation is to develop a new algorithm to improve Naive Bayes ’ performance not only on classification measured by accuracy but also on ranking measured by AUC. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[5], and compared it to NB, SBC[1] and TAN[2]. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate ranking, yet at the same time outperforms all the other algorithms slightly in terms of classification accuracy.

    Learning Tree Augmented Naive Bayes for Ranking

    No full text
    Abstract. Naive Bayes has been widely used in data mining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, among which tree augmented naive Bayes (TAN) [3] achieves a significant improvement in term of classification accuracy, while maintaining efficiency and model simplicity. In many real-world data mining applications, however, an accurate ranking is more desirable than a classification. Thus it is interesting whether TAN also achieves significant improvement in term of ranking, measured by AUC(the area under the Receiver Operating Characteristics curve) [8, 1]. Unfortunately, our experiments show that TAN performs even worse than naive Bayes in ranking. Responding to this fact, we present a novel learning algorithm, called forest augmented naive Bayes (FAN), by modifying the traditional TAN learning algorithm. We experimentally test our algorithm on all the 36 data sets recommended by Weka [12], and compare it to naive Bayes, SBC [6], TAN [3], and C4.4 [10], in terms of AUC. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate rankings. Our work provides an effective and efficient data mining algorithm for applications in which an accurate ranking is required

    Attribute Value Weighted Average of One-Dependence Estimators

    No full text
    Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, semi-naive Bayesian classifiers which utilize one-dependence estimators (ODEs) have been shown to be able to approximate the ground-truth attribute dependencies; meanwhile, the probability estimation in ODEs is effective, thus leading to excellent performance. In previous studies, ODEs were exploited directly in a simple way. For example, averaged one-dependence estimators (AODE) weaken the attribute independence assumption by directly averaging all of a constrained class of classifiers. However, all one-dependence estimators in AODE have the same weights and are treated equally. In this study, we propose a new paradigm based on a simple, efficient, and effective attribute value weighting approach, called attribute value weighted average of one-dependence estimators (AVWAODE). AVWAODE assigns discriminative weights to different ODEs by computing the correlation between the different root attribute value and the class. Our approach uses two different attribute value weighting measures: the Kullback–Leibler (KL) measure and the information gain (IG) measure, and thus two different versions are created, which are simply denoted by AVWAODE-KL and AVWAODE-IG, respectively. We experimentally tested them using a collection of 36 University of California at Irvine (UCI) datasets and found that they both achieved better performance than some other state-of-the-art Bayesian classifiers used for comparison

    Evaluation and comparison of in vitro antioxidant activities of unsaponifiable fraction of 11 kinds of edible vegetable oils

    No full text
    The radical scavenging capabilities of the extracts from eleven edible vegetable oils were investigated by using 2,2‐diphenyl‐1‐picrylhydrazyl (DPPH ), 2,2′‐azino‐bis‐3‐ ethylbenzothiazoline‐6‐sulfonic acid (ABTS ), and ferric reducing ability of plasma (FRAP ) assays. The results indicated that rapeseed oil and sesame oil showed higher radical scavenging abilities than other vegetable oils. When the radical scavenging capabilities of the extracts from virgin camellia oils and commercially available refined camellia oils were evaluated by FRAP assay, the results showed that the antioxidant capabilities of the former were higher than the latter. Therefore, it is recommended that moderate refining processes should be taken to minimize the loss of antioxidant components and people consume virgin oils or less processed edible vegetable oils for higher antioxidant activities
    corecore