461 research outputs found

    Improving the Prediction Accuracy of Text Data and Attribute Data Mining with Data Preprocessing

    Get PDF
    Data Mining is the extraction of valuable information from the patterns of data and turning it into useful knowledge. Data preprocessing is an important step in the data mining process. The quality of the data affects the result and accuracy of the data mining results. Hence, Data preprocessing becomes one of the critical steps in a data mining process. In the research of text mining, document classification is a growing field. Even though we have many existing classifying approaches, Naïve Bayes Classifier is good at classification because of its simplicity and effectiveness. The aim of this paper is to identify the impact of preprocessing the dataset on the performance of a Naïve Bayes Classifier. Naïve Bayes Classifier is suggested as the best method to identify the spam emails. The Impact of preprocessing phase on the performance of the Naïve Bayes classifier is analyzed by comparing the output of both the preprocessed dataset result and non-preprocessed dataset result. The test results show that combining Naïve Bayes classification with the proper data preprocessing can improve the prediction accuracy. In the research of Attributed data mining, a decision tree is an important classification technique. Decision trees have proved to be valuable tools for the classification, description, and generalization of data. J48 is a decision tree algorithm which is used to create classification model. J48 is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool. In this paper, we present the method of improving accuracy for decision tree mining with data preprocessing. We applied the supervised filter discretization on J48 algorithm to construct a decision tree. We compared the results with the J48 without discretization. The results obtained from experiments show that accuracy of J48 after discretization is better than J48 before discretization

    Machine Learning Techniques as Applied to Discrete and Combinatorial Structures

    Get PDF
    Machine Learning Techniques have been used on a wide array of input types: images, sound waves, text, and so forth. In articulating these input types to the almighty machine, there have been all sorts of amazing problems that have been solved for many practical purposes. Nevertheless, there are some input types which don’t lend themselves nicely to the standard set of machine learning tools we have. Moreover, there are some provably difficult problems which are abysmally hard to solve within a reasonable time frame. This thesis addresses several of these difficult problems. It frames these problems such that we can then attempt to marry the allegedly powerful utility of existing machine learning techniques to the practical solvability of said problems
    corecore