16,587 research outputs found
PRESISTANT: Learning based assistant for data pre-processing
Data pre-processing is one of the most time consuming and relevant steps in a
data analysis process (e.g., classification task). A given data pre-processing
operator (e.g., transformation) can have positive, negative or zero impact on
the final result of the analysis. Expert users have the required knowledge to
find the right pre-processing operators. However, when it comes to non-experts,
they are overwhelmed by the amount of pre-processing operators and it is
challenging for them to find operators that would positively impact their
analysis (e.g., increase the predictive accuracy of a classifier). Existing
solutions either assume that users have expert knowledge, or they recommend
pre-processing operators that are only "syntactically" applicable to a dataset,
without taking into account their impact on the final analysis. In this work,
we aim at providing assistance to non-expert users by recommending data
pre-processing operators that are ranked according to their impact on the final
analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the
impact of pre-processing operators on the performance (e.g., predictive
accuracy) of 5 different classification algorithms, such as J48, Naive Bayes,
PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the
recommendations provided by our tool, show that PRESISTANT can effectively help
non-experts in order to achieve improved results in their analytical tasks
An Overview of the Use of Neural Networks for Data Mining Tasks
In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
- …