11 research outputs found
Extracting Rules for Diagnosis of Diabetes Using Genetic Programming
Background: Diabetes is a global health challenge that cusses high incidence of major social and economic consequences. As such, early prevention or identification of those people at risk is crucial for reducing the problems caused by it. The aim of study was to extract the rules for diabetes diagnosing using genetic programming.
Methods: This study utilized the PIMA dataset of the University of California, Irvine. This dataset consists of the information of 768 Pima heritage women, including 500 healthy persons and 268 persons with diabetes. Regarding the missing values and outliers in this dataset, the K-nearest neighbor and k-means methods are applied respectively. Moreover, a genetic programming model (GP) was conducted to diagnose diabetes as well as to determine the most important factors affecting it. Accuracy, sensitivity and specificity of the proposed model on the PIMA dataset were obtained as 79.32, 58.96 and 90.74%, respectively.
Results: The experimental results of our model on PIMA revealed that age, PG concentration, BMI, Tri Fold Thick and Serum Ins were effective in diabetes mellitus and increased risk of diabetes. In addition, the good performance of the model coupled with the simplicity and comprehensiveness of the extracted rules is also shown by the experimental results.
Conclusions: GPs can effectively implement the rules for diagnosing diabetes. Both BMI and PG Concentration are also the most important factors to increase the risk of suffering from diabetes.
Keywords: Diabetes, PIMA, Genetic programming, KNNi, K-means, Missing value, Outlier detection, Rule extraction
Extracting Rules for Diagnosis of Diabetes Using Genetic Programming
Background: Diabetes is a global health challenge that cusses high incidence of major social and economic consequences. As such, early prevention or identification of those people at risk is crucial for reducing the problems caused by it. The aim of study was to extract the rules for diabetes diagnosing using genetic programming.
Methods: This study utilized the PIMA dataset of the University of California, Irvine. This dataset consists of the information of 768 Pima heritage women, including 500 healthy persons and 268 persons with diabetes. Regarding the missing values and outliers in this dataset, the K-nearest neighbor and k-means methods are applied respectively. Moreover, a genetic programming model (GP) was conducted to diagnose diabetes as well as to determine the most important factors affecting it. Accuracy, sensitivity and specificity of the proposed model on the PIMA dataset were obtained as 79.32, 58.96 and 90.74%, respectively.
Results: The experimental results of our model on PIMA revealed that age, PG concentration, BMI, Tri Fold Thick and Serum Ins were effective in diabetes mellitus and increased risk of diabetes. In addition, the good performance of the model coupled with the simplicity and comprehensiveness of the extracted rules is also shown by the experimental results.
Conclusions: GPs can effectively implement the rules for diagnosing diabetes. Both BMI and PG Concentration are also the most important factors to increase the risk of suffering from diabetes.
Keywords: Diabetes, PIMA, Genetic programming, KNNi, K-means, Missing value, Outlier detection, Rule extraction
Predicting the status of COVID-19 active cases using a neural network time series
The design of intelligent systems for analyzing information and predicting the epidemiological trends of the disease is rapidly expanding because of the coronavirus disease (COVID-19) pandemic. The COVID-19 datasets provided by Johns Hopkins University were included in the analysis. This dataset contains some missing data that is imputed using the multi-objective particle swarm optimization method. A time series model based on nonlinear autoregressive exogenou (NARX) neural network is proposed to predict the recovered and death COVID-19 cases. This model is trained and evaluated for two modes: predicting the situation of the affected areas for the next day and the next month. After training the model based on the data from January 22 to February 27, 2020, the performance of the proposed model was evaluated in predicting the situation of the areas in the coming two weeks. The error rate was less than 5%. The prediction of the proposed model for April 9, 2020, was compared with the actual data for that day. The absolute percentage error (AE) worldwide was 12%. The lowest mean absolute error (MAE) of the model was for South America and Australia with 3 and 3.3, respectively. In this paper, we have shown that geographical areas with mortality and recovery of COVID-19 cases can be predicted using a neural network-based model
Spatial-time analysis of cardiovascular emergency medical requests: enlightening policy and practice
Background: Response time to cardiovascular emergency medical requests is an important indicator in reducing cardiovascular disease (CVD) -related mortality. This study aimed to visualize the spatial-time distribution of response time, scene time, and call-to-hospital time of these emergency requests. We also identified patterns of clusters of CVD-related calls. Methods: This cross-sectional study was conducted in Mashhad, north-eastern Iran, between August 2017 and December 2019. The response time to every CVD-related emergency medical request call was computed using spatial and classical statistical analyses. The Anselin Local Moran's I was performed to identify potential clusters in the patterns of CVD-related calls, response time, call-to-hospital arrival time, and scene-to-hospital arrival time at small area level (neighborhood level) in Mashhad, Iran. Results: There were 84,239 CVD-related emergency request calls, 61.64% of which resulted in the transport of patients to clinical centers by EMS, while 2.62% of callers (a total of 2218 persons) died before EMS arrival. The number of CVD-related emergency calls increased by almost 7% between 2017 and 2018, and by 19% between 2017 and 2019. The peak time for calls was between 9 p.m. and 1 a.m., and the lowest number of calls were recorded between 3 a.m. and 9 a.m. Saturday was the busiest day of the week in terms of call volume. There were statistically significant clusters in the pattern of CVD-related calls in the south-eastern region of Mashhad. Further, we found a large spatial variation in scene-to-hospital arrival time and call-to-hospital arrival time in the area under study. Conclusion: The use of geographical information systems and spatial analyses in modelling and quantifying EMS response time provides a new vein of knowledge for decision makers in emergency services management. Spatial as well as temporal clustering of EMS calls were present in the study area. The reasons for clustering of unfavorable time indices for EMS response requires further exploration. This approach enables policymakers to design tailored interventions to improve response time and reduce CVD-related mortality.This study was financially sponsored by Mashhad University of Medical
Sciences (Project grant: 980861)
Predicting the incidence of COVID-19 using data mining
Abstract Background The high prevalence of COVID-19 has made it a new pandemic. Predicting both its prevalence and incidence throughout the world is crucial to help health professionals make key decisions. In this study, we aim to predict the incidence of COVID-19 within a two-week period to better manage the disease. Methods The COVID-19 datasets provided by Johns Hopkins University, contain information on COVID-19 cases in different geographic regions since January 22, 2020 and are updated daily. Data from 252 such regions were analyzed as of March 29, 2020, with 17,136 records and 4 variables, namely latitude, longitude, date, and records. In order to design the incidence pattern for each geographic region, the information was utilized on the region and its neighboring areas gathered 2 weeks prior to the designing. Then, a model was developed to predict the incidence rate for the coming 2 weeks via a Least-Square Boosting Classification algorithm. Results The model was presented for three groups based on the incidence rate: less than 200, between 200 and 1000, and above 1000. The mean absolute error of model evaluation were 4.71, 8.54, and 6.13%, respectively. Also, comparing the forecast results with the actual values in the period in question showed that the proposed model predicted the number of globally confirmed cases of COVID-19 with a very high accuracy of 98.45%. Conclusion Using data from different geographical regions within a country and discovering the pattern of prevalence in a region and its neighboring areas, our boosting-based model was able to accurately predict the incidence of COVID-19 within a two-week period
Providing an imputation algorithm for missing values of longitudinal data using Cuckoo search algorithm: A case study on cervical dystonia
Background: Missing values in data are found in a large number of studies in the field of medical sciences,
especially longitudinal ones, in which repeated measurements are taken from each person during the study. In this
regard, several statistical endeavors have been performed on the concepts, issues, and theoretical methods during
the past few decades.
Methods: Herein, we focused on the missing data related to patients excluded from longitudinal studies. To this
end, two statistical parameters of similarity and correlation coefficient were employed. In addition, metaheuristic
algorithms were applied to achieve an optimal solution. The selected metaheuristic algorithm, which has a great
search functionality, was the Cuckoo search algorithm.
Results: Profiles of subjects with cervical dystonia (CD) were used to evaluate the proposed model after applying
missingness. It was concluded that the algorithm used in this study had a higher accuracy (98.48%), compared with
similar approaches.
Conclusion: Concomitant use of similar parameters and correlation coefficients led to a significant increase in
accuracy of missing data imputation
Predictive model for survival in patients with gastric cancer
Background and aim: Gastric cancer is one of the most prevalent cancers in the world. Characterized by poor
prognosis, it is a frequent cause of cancer in Iran. The aim of the study was to design a predictive model of survival
time for patients suffering from gastric cancer.
Methods: This was a historical cohort conducted between 2011 and 2016. Study population were 277 patients
suffering from gastric cancer. Data were gathered from the Iranian Cancer Registry and the laboratory of Emam
Reza Hospital in Mashhad, Iran. Patients or their relatives underwent interviews where it was needed. Missing
values were imputed by data mining techniques. Fifteen factors were analyzed. Survival was addressed as a
dependent variable. Then, the predictive model was designed by combining both genetic algorithm and logistic
regression. Matlab 2014 software was used to combine them.
Results: Of the 277 patients, only survival of 80 patients was available whose data were used for designing the
predictive model. Mean ± SD of missing values for each patient was 4.43±1.41 combined predictive model
achieved 72.57% accuracy. Sex, birth year, age at diagnosis time, age at diagnosis time of patients’ family, family
history of gastric cancer, and family history of other gastrointestinal cancers were six parameters associated with
patient survival.
Conclusion: The study revealed that imputing missing values by data mining techniques have a good accuracy.
And it also revealed six parameters extracted by genetic algorithm effect on the survival of patients with gastric
cancer. Our combined predictive model, with a good accuracy, is appropriate to forecast the survival of patients
suffering from Gastric cancer. So, we suggest policy makers and specialists to apply it for prediction of patients’
survival