4,408 research outputs found

    Automated data pre-processing via meta-learning

    Get PDF
    The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

    Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Full text link
    The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC). Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC) is used to find data tables and comparison Area Under Curve (AUC)

    USING DATA MINING TO DETECT ANOMALOUS PRODUCER BEHAVIOR: AN ANALYSIS OF SOYBEAN PRODUCTION AND THE FEDERAL CROP INSURANCE PROGRAM

    Get PDF
    The analysis was conducted on the USDA's Risk Management Agency insurance data and NRCS Land Resource Regions from 1994 - 2001 to assist RMA in improving program integrity. The objective is to develop a data-mining algorithm that identifies anomalous producers and counties within LRRs based upon the percentage of acres harvested.Risk and Uncertainty,

    A Data Mining Algorithm for Monitoring PCB Assembly Quality

    Get PDF

    Research of Data Mining Algorithm Based on Cloud Database

    Get PDF
    There is an immense amount of data in the cloud database and among these data, much potential and valuable knowledge are implicit. The key point is to discover and pick out the useful knowledge, and to do so automatically. In this paper, the data model of the cloud database is analyzed. Through analyzing and classifying, the common features of the data are extracted to form a feature data set. The relationships among different areas in the data are then analyzed, from which the new knowledge can be found. In the paper, the basic data mining model based on the cloud database is defined, and the discovery algorithm is presented

    Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Get PDF
    The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC). Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC) is used to find data tables and comparison Area Under Curve (AUC)

    Classification of Al-Hadith Al-Shareef using data mining algorithm

    Get PDF
    In this paper we compared the effectiveness of four different automatic learning algorithms for classifying Al-Hadith Al-Shareef into 8 selective books depending on Sahih BuKhari.The automatic learning algorithms are Rocchio algorithm, K-NN algorithm (K- Nearest Neighbor), Naïve Bayes algorithm and SVM algorithm (Support Vector Machines). We used TF-IDF technique to compute the relative frequency for each word in a particular document. We split the documents of AL-Hadith in such 75% of AL-Hadiths (1350 Hadiths) are used as training data (build the classifier) and the remaining 25% of AL-Hadith (150 Hadiths) are used for testing the accuracy of the resulting models in reproducing the manual category assignments.The average of words in each document is about 5to10 words
    corecore