Search CORE

4,408 research outputs found

Automated data pre-processing via meta-learning

Author: A Guazzelli
A Kalousis
D Pyle
F Serban
J Vanschoren
J-U Kietz
M Hall
MA Munson
SF Crone
T Dasu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Application of Data Mining Algorithm to Recipient of Motorcycle Installment

Author: Howard Fleit (3858766)
Janet Fischel (3858778)
Latha Chandran (3858769)
Richard Iuli (3858772)
Wei-Hsin Lu (3858775)
Publication venue: Bina Nusantara University
Publication date: 01/01/2015
Field of study

The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC). Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC) is used to find data tables and comparison Area Under Curve (AUC)

Neliti

FigShare

USING DATA MINING TO DETECT ANOMALOUS PRODUCER BEHAVIOR: AN ANALYSIS OF SOYBEAN PRODUCTION AND THE FEDERAL CROP INSURANCE PROGRAM

Author: Little Bertis B.
Lovell Ashley C.
Olson Stacey
Publication venue
Publication date
Field of study

The analysis was conducted on the USDA's Risk Management Agency insurance data and NRCS Land Resource Regions from 1994 - 2001 to assist RMA in improving program integrity. The objective is to develop a data-mining algorithm that identifies anomalous producers and counties within LRRs based upon the percentage of acres harvested.Risk and Uncertainty,

Research Papers in Economics

A Data Mining Algorithm for Monitoring PCB Assembly Quality

Author: Feng Zhang
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Research of Data Mining Algorithm Based on Cloud Database

Author: Xia Zhang
Publication venue: Global Journals Inc. (US)
Publication date: 14/05/2014
Field of study

There is an immense amount of data in the cloud database and among these data, much potential and valuable knowledge are implicit. The key point is to discover and pick out the useful knowledge, and to do so automatically. In this paper, the data model of the cloud database is analyzed. Through analyzing and classifying, the common features of the data are extracted to form a feature data set. The relationships among different areas in the data are then analyzed, from which the new knowledge can be found. In the paper, the basic data mining model based on the cloud database is defined, and the discovery algorithm is presented

Global Journal of Computer Science and Technology (GJCST)

Recommended from our members

A customizable multi-agent system for distributed data mining

Author: Di Fatta Giuseppe
Fortino Giancarlo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances

Central Archive at the University of Reading

CiteSeerX

Crossref

Application of Data Mining Algorithm to Recipient of Motorcycle Installment

Author: Destiawati F. (Fitriana)
Dhika H. (Harry)
Publication venue: Bina Nusantara University
Publication date: 01/01/2015
Field of study

Neliti

Directory of Open Access Journals

Classification of Al-Hadith Al-Shareef using data mining algorithm

Author: Alkhatib Manar
Publication venue: ZU Scholars
Publication date: 01/12/2010
Field of study

In this paper we compared the effectiveness of four different automatic learning algorithms for classifying Al-Hadith Al-Shareef into 8 selective books depending on Sahih BuKhari.The automatic learning algorithms are Rocchio algorithm, K-NN algorithm (K- Nearest Neighbor), Naïve Bayes algorithm and SVM algorithm (Support Vector Machines). We used TF-IDF technique to compute the relative frequency for each word in a particular document. We split the documents of AL-Hadith in such 75% of AL-Hadiths (1350 Hadiths) are used as training data (build the classifier) and the remaining 25% of AL-Hadith (150 Hadiths) are used for testing the accuracy of the resulting models in reproducing the manual category assignments.The average of words in each document is about 5to10 words

ZU Scholars (Zayed University)