Search CORE

81,975 research outputs found

ALGORITHM COMPARISON AND FEATURE SELECTION FOR CLASSIFICATION OF BROILER CHICKEN HARVEST

Author: Cahyaningtyas Christian
Manongga Danny
Sembiring Irwan
Publication venue: 'Universitas Jenderal Soedirman'
Publication date: 26/12/2022
Field of study

Broiler chickens are the result of superior breeds that produce a lot of meat. In practice, however, many breeders experience crop failure, which has a serious impact on the economy and can also affect farmer quality, resulting in sanctions. The value of the performance index produced at harvest indicates the success rate of harvesting broiler chickens. Broiler crop yield data can be used to help classify broiler crop yield data using an approach method. The CRISP-DM (Cross Industry Standard Process for Data Mining) method was used in this study's data mining technique. This study compares 3 classification algorithms to determine the best algorithm and 3 feature selection methods to determine the best method for improving algorithm performance. According to the findings of this study, the Random Forest algorithm is the best algorithm for classifying harvest data, with an accuracy rate of 89.14 percent. The best way to improve the algorithm's performance is to use the Backward Elimination method, which can increase the accuracy by 7.53 percent. As a result, the Random Forest + Backward Elimination algorithm yields an accuracy value of 96.67 percent. According to this study, the factors that influence crop yield increase are FCR, number of harvests, and body weight

Jurnal Teknik Informatika (JUTIF)

On Identifying Critical Nuggets Of Information During Classification Task

Author: Sathiaraj David
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the search space for finding critical nuggets, and isolates and validates critical nuggets from some real world data sets. It seems that only a few subsets may qualify to be critical nuggets, underlying the importance of finding them. The proposed methodology can detect them. This work also identifies certain properties of critical nuggets and provides experimental validation of the properties. Critical nuggets were then applied to 2 important classification task related performance metrics - classification accuracy and misclassification costs. Experimental results helped validate that critical nuggets can assist in improving classification accuracies in real world data sets when compared with other standalone classification algorithms. The improvements in accuracy using the critical nuggets were statistically significant. Extensive studies were also undertaken on real world data sets that utilized critical nuggets to help minimize misclassification costs. In this case as well the critical nuggets based approach yielded statistically significant, lower misclassification costs than than standalone classification methods

Louisiana State University

Feature selection from colon cancer dataset for cancer classification using Artificial Neural Network

Author: Muniyandi Ravie Chandren
Rahman Md. Akizur
Publication venue: 'Insight Society'
Publication date: 26/09/2018
Field of study

In the fast-growing field of medicine and its dynamic demand in research, a study that proves significant improvement to healthcare seems imperative especially when it is on cancer research. This research paved way to such significant findings by the inclusion of feature selection as one of its major components. Feature selection has become a vital task to apply data mining algorithms effectively in the real-world problems for classification. Feature selection has been the focus of interest for quite some time and much completed work related to it. Although much research conducted on the field, a study that proved a nearly perfect accuracy seems limited; hence, more scientifically driven results should be produced. Using various research on feature selection as basis for the choices in this study, the method was product of careful selection and planning. Specifically, this study used feature selection for improving classification accuracy on cancerous dataset. This study proposed Artificial Neural Network (ANN) for cancer classification with feature selection on colon cancer dataset. The study used best first search method in weka tools for feature selection. Through the process, a promising result has been achieved. The result of the experiment achieved 98.4 % accuracy for cancer classification after feature selection by using proposed algorithm. The result displayed that feature selection improved the classification accuracy based on the experiment conducted on the colon cancer dataset. The result of this experiment was comparable with the other studies on colon cancer research. It showed another significant improvement and can be considered promising for more future applications

International Journal on Advanced Science, Engineering and Information Technology

Comparison of Decision Tree, Naïve Bayes and KNearest Neighbors for Predicting Thesis Graduation

Author: Solichin Achmad
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 18/09/2019
Field of study

Thesis is one of the evaluations of learning for students. In Universitas Budi Luhur (UBL), especially in the Informatics Department, the thesis is one of the requirements for graduating students to obtain a Bachelor of Computer degree. In each semester, the number of Informatics Department students who take thesis is around 200-300 students. The problem that is still faced is that student graduation in the thesis is not optimal. Student failures in the thesis are allegedly related to several technical and nontechnical factors. In this study, an analysis using data mining algorithms was carried out to determine the factors that influence student graduation in the thesis. The dataset obtained from the Informatics Department students who took a thesis in the 2016/2017, and 2017/2018. In order to obtain the right classification method, this research was tested with three classification methods, namely Decision Tree, Naïve Bayes, and k-Nearest Neighbors (kNN). The results of the comparison of the values of accuracy, precision, and recall indicate that the kNN algorithm has advantages, so this method is chosen to predict graduation. In this study also developed an application for predicting graduation of students' thesis by applying the kNN classification method. The test results showed an accuracy of 78.20%, precision of 80.32%, and recall of 96.49%. This research is expected to be useful for improving the service quality of student thesi

Proceeding of the Electrical Engineering Computer Science and Informatics

An evolutionary approach for balancing effectiveness and representation level in gene selection

Author: CANNAS LAURA MARIA
DESSI NICOLETTA
PES BARBARA
Publication venue: 'IGI Global'
Publication date: 01/01/2015
Field of study

As data mining develops and expands to new application areas, feature selection also reveals various aspects to be considered. This paper underlines two aspects that seem to categorize the large body of available feature selection algorithms: the effectiveness and the representation level. The effectiveness deals with selecting the minimum set of variables that maximize the accuracy of a classifier and the representation level concerns discovering how relevant the variables are for the domain of interest. For balancing the above aspects, the paper proposes an evolutionary framework for feature selection that expresses a hybrid method, organized in layers, each of them exploits a specific model of search strategy. Extensive experiments on gene selection from DNA-microarray datasets are presented and discussed. Results indicate that the framework compares well with different hybrid methods proposed in literature as it has the capability of finding well suited subsets of informative features while improving classification accurac

Archivio istituzionale della ricerca - Università di Cagliari

PRESISTANT: Learning based assistant for data pre-processing

Author: Abelló Alberto
Aluja-Banet Tomàs
Bilalli Besim
Wrembel Robert
Publication venue
Publication date: 02/03/2018
Field of study

Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC