thesis

Data mining using Matlab

Abstract

Data mining is a relatively new field emerging in many disciplines. It is becoming more popular as technology advances, and the need for efficient data analysis is required. The aim of data mining itself is not to provide strict rules by analysing the full data set, data mining is used to predict with some certainty while only analysing a small portion of the data. This project seeks to compare the efficiency of a decision tree induction method with that of the neural network method. MATLAB has inbuilt data mining toolboxes. However the decision tree induction method is not as yet implemented. Decision tree induction has been implemented in several forms in the past. The greatest contribution to this method has been made by DR John Ross Quinlan, who has brought forward this method in the form of ID3, C4.5 and C5 algorithms. The methodologies used within ID3 and C4.5 are well documented and therefore provide a strong platform for the implementation of this method within a higher level language. The objectives of this study are to fully comprehend two methods of data mining, namely decision tree induction and neural networks. The decision tree induction method is to be implemented within the mathematical computer language MATLAB. The results found when analysing some suitable data will be compared with the results from the neural network toolbox already implemented in MATLAB. The data used to compare and contrast the two methods included voting records from the US House of Representatives, which consists of yes, no and undecided votes on sixteen separate issues. The voters are grouped into categories according to their political party. This can be either republican or democratic. The objective of using this data set is to predict what party a congressman is affiliated with by analysing their voting trends. The findings of this study reveal that the decision tree method can accurately predict outcomes if an ideal data set is used for building the tree. The neural network method has less accuracy in some situations however it is more robust towards unexpected data

    Similar works