Influence Distribution Training Data on Performance Supervised Machine Learning Algorithms

Abstract

Almost all fields of life need Banknote. Even particular fields of life require banknotes in large quantities such as banks, transportation companies, and casinos. Therefore Banknotes are an essential component in carrying out all activities every day, especially those related to finance. Through technological advancements such as scanners and copy machine, it can provide the opportunity for anyone to commit a crime. The crime is like a counterfeit banknote. Many people still find it difficult to distinguish between a genuine banknote ad counterfeit Banknote, that is because counterfeit Banknote produced have a high degree of resemblance to the genuine Banknote. Based on that background, authors want to do a classification process to distinguish between genuine Banknote and counterfeit Banknote. The classification process use methods Supervised Learning and compares the level of accuracy based on the distribution of training data. The methods of supervised Learning used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Naïve Bayes. K-NN method is a method that has the highest specificity, sensitivity, and accuracy of the three methods used by the authors both in the training data of 30%, 50%, and 80%. Where in the training data 30% and 50% value specificity: 0.99, sensitivity: 1.00, accuracy: 0.99. While the 80% training data value specificity: 1.00, sensitivity: 1.00, accuracy: 1.00. This means that the distribution of training data influences the performance of the Supervised Machine Learning algorithm. In the KNN method, the greater the training data, the better the accuracy

    Similar works