42 research outputs found

    GENESIM : genetic extraction of a single, interpretable model

    Get PDF
    Models obtained by decision tree induction techniques excel in being interpretable.However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques are able to achieve a higher accuracy. However, this comes at a cost of losing interpretability of the resulting model. This makes ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the GENESIM algorithm that transforms an ensemble of decision trees to a single decision tree with an enhanced predictive performance by using a genetic algorithm. We compared GENESIM to prevalent decision tree induction and ensemble techniques using twelve publicly available data sets. The results show that GENESIM achieves a better predictive performance on most of these data sets than decision tree induction techniques and a predictive performance in the same order of magnitude as the ensemble techniques. Moreover, the resulting model of GENESIM has a very low complexity, making it very interpretable, in contrast to ensemble techniques.Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex System

    Optimized Anomaly based Risk Reduction using PCA based Genetic Classifier

    Get PDF
    Security risk analysis is the thrust area for the information based world The researchers in this field deployed numerous techniques to overcome the information security oriented problem In this paper the researcher tried for a approach of using anomaly detection for the risk reduction The hub initiative for this work is that the anomalies are the deviation which could increase the percentage of risk The anomaly detection is guided by the PCA and the genetic based multi class classifier is used The classification is induced by the decision tree approach were the genetic algorithm is set out for the optimization in the process of finding the nodes of the tree The proposed approach is evaluated with the bench mark on PCA based ANN classifier The proposed approach outperforms the existing one The results are demonstrate

    A genetic algorithm for interpretable model extraction from decision tree ensembles

    Get PDF
    Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity

    Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation)

    Get PDF
    Data mining concept is growing fast in popularity, it is a technology that involving methods at the intersection of (Artificial intelligent, Machine learning, Statistics and database system), the main goal of data mining process is to extract information from a large data into form which could be understandable for further use. Some algorithms of data mining are used to give solutions to classification problems in database. In this paper a comparison among three classification鈥檚 algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) algorithms. The paper will demonstrate the strength and accuracy of each algorithm for classification in term of performance efficiency and time complexity required. For model validation purpose, twenty-four-month data analysis is conducted on a mock-up basis. Keywords: Decision tree, Bayesian network, k- nearest neighbour classifier

    Comparing Data Mining Classification for Online Fraud Victim Profile in Indonesia

    Get PDF
    Classification is one of the most often employed data mining techniques. It focuses on developing a classification model or function, also known as a classifier, and predicting the class of objects whose class label is unknown. Categorizing applications include pattern recognition, medical diagnosis, identifying weaknesses in organizational systems, and classifying changes in the financial markets. The objectives of this study are to develop a profile of a victim of online fraud and to contrast the approaches frequently used in data mining for classification based on Accuracy, Classification Error, Precision, and Recall. The survey was conducted using Google Forms, which is an online platform. Naive Bayes, Decision Tree, and Random Forest algorithms are popular models for classification in data mining. Based on the sociodemographics of Indonesia's online crime victims, these models are used to classify and predict. The result shows that Na茂ve Bayes and Decision Tree are slightly superior to the Random Forest Model. Naive Bayes and Decision Tree have an accuracy value of 77.3%, while Random Forest values 76.8%.Classification is one of the most often employed data mining techniques. It focuses on developing a classification model or function, also known as a classifier, and predicting the class of objects whose class label is unknown. Categorizing applications include pattern recognition, medical diagnosis, identifying weaknesses in organizational systems, and classifying changes in the financial markets. The objectives of this study are to develop a profile of a victim of online fraud and to contrast the approaches frequently used in data mining for classification based on Accuracy, Classification Error, Precision, and Recall. The survey was conducted using Google Forms, which is an online platform. Naive Bayes, Decision Tree, and Random Forest algorithms are popular models for classification in data mining. Based on the sociodemographics of Indonesia's online crime victims, these models are used to classify and predict. The result shows that Na茂ve Bayes and Decision Tree are slightly superior to the Random Forest Model. Naive Bayes and Decision Tree have an accuracy value of 77.3%, while Random Forest values 76.8%

    Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks

    Get PDF
    Cognitive radio (CR) provides a better way for utilization of spectrum resource by introducing an opportunistic usage of the frequency bands that are not heavily occupied by a licensed spectrum user or a primary user (PU). In cognitive radio, the detection and estimation of PU channel availability (unoccupied spectrum) are the key challenges that need to be overcome in order to prevent the interference with licensed spectrum user and improve spectrum resource utilization efficiency. This chapter focuses on developing new ways for detecting and estimating primary user channel availability based on machine-learning (ML) techniques

    Selecci贸n de tutores acad茅micos en la educaci贸n superior usando 谩rboles de decisi贸n

    Get PDF
    ABSTRACTIn this paper, we present a method for the tutoring process in order to improve academic tutoring in higher education. The method includes identifying the main skills of tutors in an automated manner using decision trees, one of the most used algorithms in the machine learning community for solving several real-world problems with high accuracy. In our study, the decision tree algorithm was able to identify those skills and personal affinities between students and tutors. Experiments were carried out using a data set of 277 students and 19 tutors, which were selected by random sampling and voluntary participation, respectively. Preliminary results show that the most important attributes for tutors are communication, self-direction and digital skills. At the same time, we introduce a tutoring process where the tutor assignment is based on these attributes, assuming that it can help to strengthen the student's skills demanded by today's society. In the same way, the decision tree obtained can be used to create cluster of tutors and clusters of students based on their personal abilities and affinities using other machine learning algorithms. The application of the suggested tutoring process could set the tone to see the tutoring process individually without linking it to processes of academic performance or school dropout.RESUMEN 聽En este documento se presenta un m茅todo para mejorar el proceso de tutor铆a acad茅mica en la educaci贸n superior. El m茅todo incluye a identificaci贸n de las habilidades principales de los tutores de forma autom谩tica utilizando el algoritmo 谩rboles de decisi贸n, uno de los algoritmos m谩s utilizados en la comunidad de aprendizaje autom谩tico para resolver problemas del mundo real con gran precisi贸n. En el estudio, el algoritmo arboles de decisi贸n fue capaz de identificar las habilidades y afinidades entre estudiantes y tutores. Los experimentos se llevaron a cabo utilizando un conjunto de datos de 277 estudiantes y 19 tutores, mismos que fueron seleccionados por muestreo aleatorio simple y participaci贸n voluntaria en el caso de los tutores. Los resultados preliminares muestran que los atributos m谩s importantes para los tutores son la comunicaci贸n, la autodirecci贸n y las habilidades digitales. Al mismo tiempo, se presenta un proceso de tutor铆a en el que la asignaci贸n del tutor se basa en estos atributos, asumiendo que puede ayudar a fortalecer las habilidades de los estudiantes que demanda la sociedad actual. De la misma forma, el 谩rbol de decisi贸n obtenido se puede utilizar para agrupar a tutores y estudiantes basados en sus habilidades y afinidades personales utilizando otros algoritmos de aprendizaje autom谩tico. La aplicaci贸n del proceso de tutor铆a sugerido podr铆a dar la pauta para ver el proceso de tutor铆a de manera individual sin vincularla a procesos de desempe帽o acad茅mico o deserci贸n escolar.ABSTRACTIn this paper, we present a method for the tutoring process in order to improve academic tutoring in higher education. The method includes identifying the main skills of tutors in an automated manner using decision trees, one of the most used algorithms in the machine learning community for solving several real-world problems with high accuracy. In our study, the decision tree algorithm was able to identify those skills and personal affinities between students and tutors. Experiments were carried out using a data set of 277 students and 19 tutors, which were selected by random sampling and voluntary participation, respectively. Preliminary results show that the most important attributes for tutors are communication, self-direction and digital skills. At the same time, we introduce a tutoring process where the tutor assignment is based on these attributes, assuming that it can help to strengthen the student's skills demanded by today's society. In the same way, the decision tree obtained can be used to create cluster of tutors and clusters of students based on their personal abilities and affinities using other machine learning algorithms. The application of the suggested tutoring process could set the tone to see the tutoring process individually without linking it to processes of academic performance or school dropout

    Time-Series Link Prediction Using Support Vector Machines

    Get PDF
    The prominence of social networks motivates developments in network analysis, such as link prediction, which deals with predicting the existence or emergence of links on a given network. The Vector Auto Regression (VAR) technique has been shown to be one of the best for time-series based link prediction. One VAR technique implementation uses an unweighted adjacency matrix and five additional matrices based on the similarity metrics of Common Neighbor, Adamic-Adar, Jaccard鈥檚 Coefficient, Preferential Attachment and Research Allocation Index. In our previous work, we proposed the use of the Support Vector Machines (SVM) for such prediction task, and, using the same set of matrices, we gained better results. A dataset from DBLP was used to test the performance of the VAR and SVM link prediction models for two lags. In this study, we extended the VAR and SVM models by using three, four, and five lags, and these showed that both VAR and SVM improved with more data from the lags. The VAR and SVM models achieved their highest ROC-AUC values of 84.96% and 86.32% respectively using five lags compared to lower AUC values of 84.26% and 84.98% using two lags. Moreover, we identified that improving the predictive abilities of both models is constrained by the difficulty in the prediction of new links, which we define as links that do not exist in any of the corresponding lags. Hence, we created separate VAR and SVM models for the prediction of new links. The highest ROC-AUC was still achieved by using SVM with five lags, although at a lower value of 73.85%. The significant drop in the performance of VAR and SVM predictors for the prediction of new links indicate the need for more research in this problem space. Moreover, results showed that SVM can be used as an alternative method for time-series based link prediction
    corecore