3 research outputs found

    Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis

    Get PDF
    Data mining is a technique of research necessary hidden information in a database to find interesting pattern. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset almost half of attributes are numeric types that are continuous. Continuous attributes can make accuracy lower because the data forms are unlimited, so it need to be transformed into discrete. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. So, these attributes need to be selected in advance to get more accurate results. Classification is one technique in data mining. Which one of classification algorithms is  C4.5. Purpose of this study is increasing accuracy of C4.5 algorithm by applaying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. Accuracy improvement is done by applying discretization and CFS. Discretization is used to handle continuous value, while CFS is used as attribute selection. Experiment was conducted with WEKA (Waikato Environment for Knowledge Analysis). By applying discretization and CFS in C4.5 shows an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization are 97.25% and  accuracy of C4.5 algorithm with discretization and CFS is 97.5%

    Performance evaluation of classification algorithms by excluding the most relevant attributes for dipper/non-dipper pattern estimation in Type-2 DM patients

    No full text
    Diabetes Mellitus (DM) is a high prevalence disease that causes cardiovascular morbidity and mortality. On the other hand, the absence of physiologic night-time blood pressure decrease can further lead to morbidity problems such as target organ damage both in diabetics and non-diabetics patients. However, the Non-dipping pattern can only be measured by the 24-hour ambulatory blood pressure monitoring (ABPM) device. ABPM has certain challenges such as insufficient devices to distribute to patients, lack of trained staff or high costs. Therefore, in this study, it is aimed to develop a classifier model that can achieve a sufficiently high accuracy percentage for Dipper/non-Dipper blood pressure pattern in patients by excluding ABPM data. The study was conducted with 56 Turkish patients in Marmara University Hypertension and Atherosclerosis Center and School of Medicine Department of Internal Medicine, Division of Endocrinology between the years 2010 and 2012. Our purpose was to find out if the proposed method would be able to detect non-dipping/dipping pattern through various data mining algorithms in WEKA platform such as J48, NaiveBayes, MLP, RBF. All algorithms were run to get accurate Dipper/non-Dipper pattern estimation excluding the attributes of ABPM data. The results show that Neural Network (MLP and RBF) algorithms mostly produced reasonably high classification accuracy, sensitivity and specificity percentages reaching up to 90.63% when the attributes were reduced. However in medical sciences, sensitivity is taken as a valid and reliable indication for diagnosis. Therefore, MLP had a higher sensitivity percentage (83.3%) than others. Also, ROC values, which had the closest values to 1, were achieved by RBF for each selection mode. ROC was 0.872 for 10 fold CV mode and 0.856 for percentage split mode. Finally, ANN MLP and RBF algorithms were used, and it was observed that RBF algorithm had the highest success rate in terms of sensitivity that was 83.3%. In medical diagnosis, a higher sensitivity performance is regarded as a more valid indication of metric than a higher specificity. The proposed model could represent an innovative approach that might simplify and fasten the diagnosis process by skipping some steps in Dipper/non-Dipper diagnosis/prognosis

    Performance evaluation of classification algorithms by excluding the most relevant attributes for dipper/non-dipper pattern estimation in Type-2 DM patients

    No full text
    Diabetes Mellitus (DM) is a high prevalence disease that causes cardiovascular morbidity and mortality. On the other hand, the absence of physiologic night-time blood pressure decrease can further lead to morbidity problems such as target organ damage both in diabetics and non-diabetics patients. However, the Non-dipping pattern can only be measured by the 24-hour ambulatory blood pressure monitoring (ABPM) device. ABPM has certain challenges such as insufficient devices to distribute to patients, lack of trained staff or high costs. Therefore, in this study, it is aimed to develop a classifier model that can achieve a sufficiently high accuracy percentage for Dipper/non-Dipper blood pressure pattern in patients by excluding ABPM data The study was conducted with 56 Turkish patients in Marmara University Hypertension and Atherosclerosis Center and School of Medicine Department of Internal Medicine, Division of Endocrinology between the years 2010 and 2012. Our purpose was to find out if the proposed method would be able to detect non-dipping/dipping pattern through various data mining algorithms in WEKA platform such as J48, NaiveBayes, MLP, RBF. All algorithms were run to get accurate Dipper/non-Dipper pattern estimation excluding the attributes of ABPM data. The results show that Neural Network (MLP and RBF) algorithms mostly produced reasonably high classification accuracy, sensitivity and specificity percentages reaching up to 90.63% when the attributes were reduced. However in medical sciences, sensitivity is taken as a valid and reliable indication for diagnosis. Therefore, MLP had a higher sensitivity percentage (83.3%) than others. Also, ROC values, which had the closest values to 1, were achieved by RBF for each selection mode. ROC was 0.872 for 10 fold CV mode and 0.856 for percentage split mode. Finally, ANN MLP and RBF algorithms were used, and it was observed that RBF algorithm had the highest success rate in terms of sensitivity that was 83.3%. In medical diagnosis, a higher sensitivity performance is regarded as a more valid indication of metric than a higher specificity. The proposed model could represent an innovative approach that might simplify and fasten the diagnosis process by skipping some steps in Dipper/non-Dipper diagnosis/prognosis.. © 2015 IEEE
    corecore