13 research outputs found

    An ensemble-based decision tree approach for educational data mining

    Get PDF
    Nowadays, data mining and machine learning techniques are applied to a variety of different topics (e. g., healthcare and disease, security, decision support, sentiment analysis, education, etc.). Educational data mining investigates the performance of students and gives solutions to enhance the quality of education. The aim of this study is to use different data mining and machine learning algorithms on actual data sets related to students. To this end, we apply two decision tree methods. The methods can create several simple and understandable rules . Moreover, the performance of a decision tree is optimized by using an ensemble technique named Rotation Forest algorithm. Our findings indicate that the Rotation Forest algorithm can enhance the performance of decision trees in terms of different metrics. In addition, we found that the size of tree generated by decision trees ensemble were bigger than simple ones. This means that the proposed methodology can reveal more information concerning simple rules

    A new nested ensemble technique for automated diagnosis of breast cancer

    No full text
    Nowadays, breast cancer is reported as one of most common cancers amongst women. Early detection of this cancer is an essential to aid in informing subsequent treatments. This study investigates automated breast cancer prediction using machine learning and data mining techniques. We proposed the nested ensemble approach which used the Stacking and Vote (Voting) as the classifiers combination techniques in our ensemble methods for detecting the benign breast tumors from malignant cancers. Each nested ensemble classifier contains 'Classifiers' and 'MetaClassifiers'. MetaClassifiers can have more than two different classification algorithms. In this research, we developed the two-layer nested ensemble classifiers. In our two-layer nested ensemble classifiers the MetaClassifiers have two or three different classification algorithms. We conducted the experiments on Wisconsin Diagnostic Breast Cancer (WDBC) dataset and K-fold Cross Validation technique are used for the model evaluation. We compared the proposed two-layer nested ensemble classifiers with single classifiers (i.e., BayesNet and Naive Bayes) in terms of the classification accuracy, precision, recall, F1 measure, ROC and computational times of training single and nested ensemble classifiers. We also compared our best model with previous works reported in the literatures in terms of accuracy. The results demonstrate that the proposed two-layer nested ensemble models outperformance the single classifiers and most of the previous works. Both SV-BayesNet-3-MetaClassifier and SV-Naive Bayes-3-MetaClassifier achieved accuracy 98.07% (K = 10). However, SV-Naive Bayes-3-MetaClassifier is more efficiency as it needs less time to build the model

    Association between work-related features and coronary artery disease: a heterogeneous hybrid feature selection integrated with balancing approach

    No full text
    Coronary artery disease (CAD) is a leading cause of death worldwide and is associated with high healthcare expenditure. Researchers are motivated to apply machine learning (ML) for quick and accurate detection of CAD. The performance of the automated systems depends on the quality of features used. Clinical CAD datasets contain different features with varying degrees of association with CAD. To extract such features, we developed a novel hybrid feature selection algorithm called heterogeneous hybrid feature selection (2HFS). In this work, we used Nasarian CAD dataset, in which work place and environmental features are also considered, in addition to other clinical features. Synthetic minority over-sampling technique (SMOTE) and Adaptive synthetic (ADASYN) are used to handle the imbalance in the dataset. Decision tree (DT), Gaussian Naive Bayes (GNB), Random Forest (RF), and XGBoost classifiers are used. 2HFS-selected features are then input into these classifier algorithms. Our results show that, the proposed feature selection method has yielded the classification accuracy of 81.23% with SMOTE and XGBoost classifier. We have also tested our approach with other well-known CAD datasets: Hungarian dataset, Long-beach-va dataset, and Z-Alizadeh Sani dataset. We have obtained 83.94%, 81.58% and 92.58% for Hungarian dataset, Long-beach va dataset, and Z-Alizadeh Sani dataset, respectively. Hence, our experimental results confirm the effectiveness of our proposed feature selection algorithm as compared to the existing state-of-the-art techniques which yielded outstanding results for the development of automated CAD systems
    corecore