3 research outputs found

    Feature Weighting Using a Clustering Approach

    Get PDF
    In recent decades, the volume and size of data has significantly increased with the growth of technology. Extracting knowledge and useful patterns in high-dimensional data are challenging. In fact, unrelated features and dimensions reduce the efficiency and increase the complexity of machine learning algorithms. However, the methods used for selecting features and weighting features are a common solution for these problems. In this study, a feature weighting approach is presented based on density-based clustering. This method has been implemented in two steps. In the first step, the features were divided into clusters using density-based clustering. In the second step, the features with a higher degree of importance were selected in accordance to the target class of each cluster. In order to evaluate the efficiency, various standard datasets were classified by the feature selection and their degree of importance. The results indicated that the simplicity and suitability of the method in the high-dimensional dataset are the main advantages of the proposed method

    Cervical Cancer Prediction using NGBFA Feature Selection Algorithm and Hybrid Ensemble Classifier

    Get PDF
    Cervical Cancer (CC) is a substantial reason of death midst middle-aged women throughout the world, specifically in developing countries, with approximately 85% of deaths. CC patients can be healed if spotted in the early stages. As no symptoms appear in the initial stages, it has become a challenge for investigators to predict the disease in the early stages. Several machine learning algorithms have been used to predict CC since the last decade. Instead of using a single classifier for the prediction, ensemble methods give accurate results, creating and combining multiple models to produce improved results. In this study, we built a hybrid ensemble classifier, 'A Robust Model Stacking: A Hybrid Ensemble,' in which a homogenous ensemble will be performed on a pool of classifiers in the base level followed by a heterogenous ensemble using the majority voting (soft) algorithm to get the final prediction of the new data. The dataset used in this study contains 858 instances with 32 features built from the risk factors and four targets made from the CC diagnosis tests. We have solved the data imbalance problem using an oversampling technique called SMOTE. The model's efficiency was evaluated based on the accuracy, recall, f1-score, precision, and AUC-ROC curve metrics for all four target variables in the dataset. The proposed Biopsy method's accuracy is 98%, Hinselmann is 97%, Schiller is 96.09%, and Citology is 93%. We implement ensemble learning in this study to increase prediction accuracy and decrease bias and variance. We carried the experiments out using the Python language in Google Colab and Jupyter notebooks. The experimental results revealed that our proposed hybrid ensemble learning records a remarkable accuracy for all four target variables
    corecore