Search CORE

42 research outputs found

GENESIM : genetic extraction of a single, interpretable model

Author: De Turck Filip
Janssens Olivier
Ongenae Femke
Van Hoecke Sofie
Vandewiele Gilles
Publication venue
Publication date: 01/01/2016
Field of study

Models obtained by decision tree induction techniques excel in being interpretable.However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques are able to achieve a higher accuracy. However, this comes at a cost of losing interpretability of the resulting model. This makes ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the GENESIM algorithm that transforms an ensemble of decision trees to a single decision tree with an enhanced predictive performance by using a genetic algorithm. We compared GENESIM to prevalent decision tree induction and ensemble techniques using twelve publicly available data sets. The results show that GENESIM achieves a better predictive performance on most of these data sets than decision tree induction techniques and a predictive performance in the same order of magnitude as the ensemble techniques. Moreover, the resulting model of GENESIM has a very low complexity, making it very interpretable, in contrast to ensemble techniques.Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex System

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Archivsystem Ask23

Optimized Anomaly based Risk Reduction using PCA based Genetic Classifier

Author: C.Kavitha
Dr. K.Iyakutti
Publication venue: Global Journals Inc. (US)
Publication date: 15/05/2014
Field of study

Security risk analysis is the thrust area for the information based world The researchers in this field deployed numerous techniques to overcome the information security oriented problem In this paper the researcher tried for a approach of using anomaly detection for the risk reduction The hub initiative for this work is that the anomalies are the deviation which could increase the percentage of risk The anomaly detection is guided by the PCA and the genetic based multi class classifier is used The classification is induced by the decision tree approach were the genetic algorithm is set out for the optimization in the process of finding the nodes of the tree The proposed approach is evaluated with the bench mark on PCA based ANN classifier The proposed approach outperforms the existing one The results are demonstrate

Global Journal of Computer Science and Technology (GJCST)

A genetic algorithm for interpretable model extraction from decision tree ensembles

Author: A Assche Van
DK Slonim
H Kargupta
JH Holland
JR Quinlan
L Breiman
L Breiman
RC Barros
TG Dietterich
W-Y Loh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity

Crossref

Ghent University Academic Bibliography

Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation)

Author: Ahmed Shereen Shukri
AL-Nabi Delveen Luqman Abd
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/07/2013
Field of study

Data mining concept is growing fast in popularity, it is a technology that involving methods at the intersection of (Artificial intelligent, Machine learning, Statistics and database system), the main goal of data mining process is to extract information from a large data into form which could be understandable for further use. Some algorithms of data mining are used to give solutions to classification problems in database. In this paper a comparison among three classification’s algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) algorithms. The paper will demonstrate the strength and accuracy of each algorithm for classification in term of performance efficiency and time complexity required. For model validation purpose, twenty-four-month data analysis is conducted on a mock-up basis. Keywords: Decision tree, Bayesian network, k- nearest neighbour classifier

International Institute for Science, Technology and Education (IISTE): E-Journals

Comparing Data Mining Classification for Online Fraud Victim Profile in Indonesia

Author: Fadlil Abdul
Kusuma Nur Makkie Perdana
Sunardi Sunardi
Publication venue: 'Universitas Nusantara PGRI Kediri'
Publication date: 10/02/2023
Field of study

Classification is one of the most often employed data mining techniques. It focuses on developing a classification model or function, also known as a classifier, and predicting the class of objects whose class label is unknown. Categorizing applications include pattern recognition, medical diagnosis, identifying weaknesses in organizational systems, and classifying changes in the financial markets. The objectives of this study are to develop a profile of a victim of online fraud and to contrast the approaches frequently used in data mining for classification based on Accuracy, Classification Error, Precision, and Recall. The survey was conducted using Google Forms, which is an online platform. Naive Bayes, Decision Tree, and Random Forest algorithms are popular models for classification in data mining. Based on the sociodemographics of Indonesia's online crime victims, these models are used to classify and predict. The result shows that Naïve Bayes and Decision Tree are slightly superior to the Random Forest Model. Naive Bayes and Decision Tree have an accuracy value of 77.3%, while Random Forest values 76.8%.Classification is one of the most often employed data mining techniques. It focuses on developing a classification model or function, also known as a classifier, and predicting the class of objects whose class label is unknown. Categorizing applications include pattern recognition, medical diagnosis, identifying weaknesses in organizational systems, and classifying changes in the financial markets. The objectives of this study are to develop a profile of a victim of online fraud and to contrast the approaches frequently used in data mining for classification based on Accuracy, Classification Error, Precision, and Recall. The survey was conducted using Google Forms, which is an online platform. Naive Bayes, Decision Tree, and Random Forest algorithms are popular models for classification in data mining. Based on the sociodemographics of Indonesia's online crime victims, these models are used to classify and predict. The result shows that Naïve Bayes and Decision Tree are slightly superior to the Random Forest Model. Naive Bayes and Decision Tree have an accuracy value of 77.3%, while Random Forest values 76.8%

INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi

Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks

Author: Mikaeil Ahmed Mohammed
Publication venue: 'IntechOpen'
Publication date: 28/03/2018
Field of study

Cognitive radio (CR) provides a better way for utilization of spectrum resource by introducing an opportunistic usage of the frequency bands that are not heavily occupied by a licensed spectrum user or a primary user (PU). In cognitive radio, the detection and estimation of PU channel availability (unoccupied spectrum) are the key challenges that need to be overcome in order to prevent the interference with licensed spectrum user and improve spectrum resource utilization efficiency. This chapter focuses on developing new ways for detecting and estimating primary user channel availability based on machine-learning (ML) techniques

IntechOpen

Crossref

Selección de tutores académicos en la educación superior usando árboles de decisión

Author: de la Calleja Jorge
Urbina Nájera Argelia B.
Publication venue: 'UNED - Universidad Nacional de Educacion a Distancia'
Publication date: 27/12/2018
Field of study

ABSTRACTIn this paper, we present a method for the tutoring process in order to improve academic tutoring in higher education. The method includes identifying the main skills of tutors in an automated manner using decision trees, one of the most used algorithms in the machine learning community for solving several real-world problems with high accuracy. In our study, the decision tree algorithm was able to identify those skills and personal affinities between students and tutors. Experiments were carried out using a data set of 277 students and 19 tutors, which were selected by random sampling and voluntary participation, respectively. Preliminary results show that the most important attributes for tutors are communication, self-direction and digital skills. At the same time, we introduce a tutoring process where the tutor assignment is based on these attributes, assuming that it can help to strengthen the student's skills demanded by today's society. In the same way, the decision tree obtained can be used to create cluster of tutors and clusters of students based on their personal abilities and affinities using other machine learning algorithms. The application of the suggested tutoring process could set the tone to see the tutoring process individually without linking it to processes of academic performance or school dropout.RESUMEN En este documento se presenta un método para mejorar el proceso de tutoría académica en la educación superior. El método incluye a identificación de las habilidades principales de los tutores de forma automática utilizando el algoritmo árboles de decisión, uno de los algoritmos más utilizados en la comunidad de aprendizaje automático para resolver problemas del mundo real con gran precisión. En el estudio, el algoritmo arboles de decisión fue capaz de identificar las habilidades y afinidades entre estudiantes y tutores. Los experimentos se llevaron a cabo utilizando un conjunto de datos de 277 estudiantes y 19 tutores, mismos que fueron seleccionados por muestreo aleatorio simple y participación voluntaria en el caso de los tutores. Los resultados preliminares muestran que los atributos más importantes para los tutores son la comunicación, la autodirección y las habilidades digitales. Al mismo tiempo, se presenta un proceso de tutoría en el que la asignación del tutor se basa en estos atributos, asumiendo que puede ayudar a fortalecer las habilidades de los estudiantes que demanda la sociedad actual. De la misma forma, el árbol de decisión obtenido se puede utilizar para agrupar a tutores y estudiantes basados en sus habilidades y afinidades personales utilizando otros algoritmos de aprendizaje automático. La aplicación del proceso de tutoría sugerido podría dar la pauta para ver el proceso de tutoría de manera individual sin vincularla a procesos de desempeño académico o deserción escolar.ABSTRACTIn this paper, we present a method for the tutoring process in order to improve academic tutoring in higher education. The method includes identifying the main skills of tutors in an automated manner using decision trees, one of the most used algorithms in the machine learning community for solving several real-world problems with high accuracy. In our study, the decision tree algorithm was able to identify those skills and personal affinities between students and tutors. Experiments were carried out using a data set of 277 students and 19 tutors, which were selected by random sampling and voluntary participation, respectively. Preliminary results show that the most important attributes for tutors are communication, self-direction and digital skills. At the same time, we introduce a tutoring process where the tutor assignment is based on these attributes, assuming that it can help to strengthen the student's skills demanded by today's society. In the same way, the decision tree obtained can be used to create cluster of tutors and clusters of students based on their personal abilities and affinities using other machine learning algorithms. The application of the suggested tutoring process could set the tone to see the tutoring process individually without linking it to processes of academic performance or school dropout

Crossref

REVISTAS CIENTÍFICAS UNED. Servicio de Publicación y Difusión Digital. Biblioteca UNED

Time-Series Link Prediction Using Support Vector Machines

Author: Co Jan Miles
Fernandez Proceso L, Jr
Publication venue: Archīum Ateneo
Publication date: 01/06/2017
Field of study

The prominence of social networks motivates developments in network analysis, such as link prediction, which deals with predicting the existence or emergence of links on a given network. The Vector Auto Regression (VAR) technique has been shown to be one of the best for time-series based link prediction. One VAR technique implementation uses an unweighted adjacency matrix and five additional matrices based on the similarity metrics of Common Neighbor, Adamic-Adar, Jaccard’s Coefficient, Preferential Attachment and Research Allocation Index. In our previous work, we proposed the use of the Support Vector Machines (SVM) for such prediction task, and, using the same set of matrices, we gained better results. A dataset from DBLP was used to test the performance of the VAR and SVM link prediction models for two lags. In this study, we extended the VAR and SVM models by using three, four, and five lags, and these showed that both VAR and SVM improved with more data from the lags. The VAR and SVM models achieved their highest ROC-AUC values of 84.96% and 86.32% respectively using five lags compared to lower AUC values of 84.26% and 84.98% using two lags. Moreover, we identified that improving the predictive abilities of both models is constrained by the difficulty in the prediction of new links, which we define as links that do not exist in any of the corresponding lags. Hence, we created separate VAR and SVM models for the prediction of new links. The highest ROC-AUC was still achieved by using SVM with five lags, although at a lower value of 73.85%. The significant drop in the performance of VAR and SVM predictors for the prediction of new links indicate the need for more research in this problem space. Moreover, results showed that SVM can be used as an alternative method for time-series based link prediction

archīum.ATENEO (Ateneo de Manila Univ.)