248 research outputs found

    Improving the Prediction Accuracy of Text Data and Attribute Data Mining with Data Preprocessing

    Get PDF
    Data Mining is the extraction of valuable information from the patterns of data and turning it into useful knowledge. Data preprocessing is an important step in the data mining process. The quality of the data affects the result and accuracy of the data mining results. Hence, Data preprocessing becomes one of the critical steps in a data mining process. In the research of text mining, document classification is a growing field. Even though we have many existing classifying approaches, Naïve Bayes Classifier is good at classification because of its simplicity and effectiveness. The aim of this paper is to identify the impact of preprocessing the dataset on the performance of a Naïve Bayes Classifier. Naïve Bayes Classifier is suggested as the best method to identify the spam emails. The Impact of preprocessing phase on the performance of the Naïve Bayes classifier is analyzed by comparing the output of both the preprocessed dataset result and non-preprocessed dataset result. The test results show that combining Naïve Bayes classification with the proper data preprocessing can improve the prediction accuracy. In the research of Attributed data mining, a decision tree is an important classification technique. Decision trees have proved to be valuable tools for the classification, description, and generalization of data. J48 is a decision tree algorithm which is used to create classification model. J48 is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool. In this paper, we present the method of improving accuracy for decision tree mining with data preprocessing. We applied the supervised filter discretization on J48 algorithm to construct a decision tree. We compared the results with the J48 without discretization. The results obtained from experiments show that accuracy of J48 after discretization is better than J48 before discretization

    Improvement of the Accuracy of Prediction Using Unsupervised Discretization Method: Educational Data Set Case Study

    Get PDF
    This paper presents a comparison of the efficacy of unsupervised and supervised discretization methods for educational data from blended learning environment. Naïve Bayes classifier was trained for each discretized data set and comparative analysis of prediction models was conducted. The research goal was to transform numeric features into maximum independent discrete values with minimum loss of information and reduction of classification error. Proposed unsupervised discretization method was based on the histogram distribution and implementation of oversampling technique. The main contribution of this research is improvement of accuracy prediction using the unsupervised discretization method which reduces the effect of ignoring class feature for educational data set

    Diabetes Disease Prediction System using HNB classifier based on Discretization Method

    Get PDF
    Diagnosing diabetes early is critical as it helps patients live with the disease in a healthy way - through healthy eating, taking appropriate medical doses, and making patients more vigilant in their movements/activities to avoid wounds that are difficult to heal for diabetic patients. Data mining techniques are typically used to detect diabetes with high confidence to avoid misdiagnoses with other chronic diseases whose symptoms are similar to diabetes. Hidden Naïve Bayes is one of the algorithms for classification, which works under a data-mining model based on the assumption of conditional independence of the traditional Naïve Bayes. The results from this research study, which was conducted on the Pima Indian Diabetes (PID) dataset collection, show that the prediction accuracy of the HNB classifier achieved 82%. As a result, the discretization method increases the performance and accuracy of the HNB classifier

    Transductive Learning for Spatial Data Classification

    Full text link
    Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the nave Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented
    corecore