11 research outputs found

    Extracting Rules for Diagnosis of Diabetes Using Genetic Programming

    Get PDF
    Background: Diabetes is a global health challenge that cusses high incidence of major social and economic consequences. As such, early prevention or identification of those people at risk is crucial for reducing the problems caused by it. The aim of study was to extract the rules for diabetes diagnosing using genetic programming. Methods: This study utilized the PIMA dataset of the University of California, Irvine. This dataset consists of the information of 768 Pima heritage women, including 500 healthy persons and 268 persons with diabetes. Regarding the missing values and outliers in this dataset, the K-nearest neighbor and k-means methods are applied respectively. Moreover, a genetic programming model (GP) was conducted to diagnose diabetes as well as to determine the most important factors affecting it. Accuracy, sensitivity and specificity of the proposed model on the PIMA dataset were obtained as 79.32, 58.96 and 90.74%, respectively. Results: The experimental results of our model on PIMA revealed that age, PG concentration, BMI, Tri Fold Thick and Serum Ins were effective in diabetes mellitus and increased risk of diabetes. In addition, the good performance of the model coupled with the simplicity and comprehensiveness of the extracted rules is also shown by the experimental results. Conclusions: GPs can effectively implement the rules for diagnosing diabetes. Both BMI and PG Concentration are also the most important factors to increase the risk of suffering from diabetes. Keywords: Diabetes, PIMA, Genetic programming, KNNi, K-means, Missing value, Outlier detection, Rule extraction

    Extracting Rules for Diagnosis of Diabetes Using Genetic Programming

    Get PDF
    Background: Diabetes is a global health challenge that cusses high incidence of major social and economic consequences. As such, early prevention or identification of those people at risk is crucial for reducing the problems caused by it. The aim of study was to extract the rules for diabetes diagnosing using genetic programming. Methods: This study utilized the PIMA dataset of the University of California, Irvine. This dataset consists of the information of 768 Pima heritage women, including 500 healthy persons and 268 persons with diabetes. Regarding the missing values and outliers in this dataset, the K-nearest neighbor and k-means methods are applied respectively. Moreover, a genetic programming model (GP) was conducted to diagnose diabetes as well as to determine the most important factors affecting it. Accuracy, sensitivity and specificity of the proposed model on the PIMA dataset were obtained as 79.32, 58.96 and 90.74%, respectively. Results: The experimental results of our model on PIMA revealed that age, PG concentration, BMI, Tri Fold Thick and Serum Ins were effective in diabetes mellitus and increased risk of diabetes. In addition, the good performance of the model coupled with the simplicity and comprehensiveness of the extracted rules is also shown by the experimental results. Conclusions: GPs can effectively implement the rules for diagnosing diabetes. Both BMI and PG Concentration are also the most important factors to increase the risk of suffering from diabetes. Keywords: Diabetes, PIMA, Genetic programming, KNNi, K-means, Missing value, Outlier detection, Rule extraction

    Predicting the status of COVID-19 active cases using a neural network time series

    Get PDF
    The design of intelligent systems for analyzing information and predicting the epidemiological trends of the disease is rapidly expanding because of the coronavirus disease (COVID-19) pandemic. The COVID-19 datasets provided by Johns Hopkins University were included in the analysis. This dataset contains some missing data that is imputed using the multi-objective particle swarm optimization method. A time series model based on nonlinear autoregressive exogenou (NARX) neural network is proposed to predict the recovered and death COVID-19 cases. This model is trained and evaluated for two modes: predicting the situation of the affected areas for the next day and the next month. After training the model based on the data from January 22 to February 27, 2020, the performance of the proposed model was evaluated in predicting the situation of the areas in the coming two weeks. The error rate was less than 5%. The prediction of the proposed model for April 9, 2020, was compared with the actual data for that day. The absolute percentage error (AE) worldwide was 12%. The lowest mean absolute error (MAE) of the model was for South America and Australia with 3 and 3.3, respectively. In this paper, we have shown that geographical areas with mortality and recovery of COVID-19 cases can be predicted using a neural network-based model

    Spatial-time analysis of cardiovascular emergency medical requests: enlightening policy and practice

    Get PDF
    Background: Response time to cardiovascular emergency medical requests is an important indicator in reducing cardiovascular disease (CVD) -related mortality. This study aimed to visualize the spatial-time distribution of response time, scene time, and call-to-hospital time of these emergency requests. We also identified patterns of clusters of CVD-related calls. Methods: This cross-sectional study was conducted in Mashhad, north-eastern Iran, between August 2017 and December 2019. The response time to every CVD-related emergency medical request call was computed using spatial and classical statistical analyses. The Anselin Local Moran's I was performed to identify potential clusters in the patterns of CVD-related calls, response time, call-to-hospital arrival time, and scene-to-hospital arrival time at small area level (neighborhood level) in Mashhad, Iran. Results: There were 84,239 CVD-related emergency request calls, 61.64% of which resulted in the transport of patients to clinical centers by EMS, while 2.62% of callers (a total of 2218 persons) died before EMS arrival. The number of CVD-related emergency calls increased by almost 7% between 2017 and 2018, and by 19% between 2017 and 2019. The peak time for calls was between 9 p.m. and 1 a.m., and the lowest number of calls were recorded between 3 a.m. and 9 a.m. Saturday was the busiest day of the week in terms of call volume. There were statistically significant clusters in the pattern of CVD-related calls in the south-eastern region of Mashhad. Further, we found a large spatial variation in scene-to-hospital arrival time and call-to-hospital arrival time in the area under study. Conclusion: The use of geographical information systems and spatial analyses in modelling and quantifying EMS response time provides a new vein of knowledge for decision makers in emergency services management. Spatial as well as temporal clustering of EMS calls were present in the study area. The reasons for clustering of unfavorable time indices for EMS response requires further exploration. This approach enables policymakers to design tailored interventions to improve response time and reduce CVD-related mortality.This study was financially sponsored by Mashhad University of Medical Sciences (Project grant: 980861)

    Predicting the incidence of COVID-19 using data mining

    No full text
    Abstract Background The high prevalence of COVID-19 has made it a new pandemic. Predicting both its prevalence and incidence throughout the world is crucial to help health professionals make key decisions. In this study, we aim to predict the incidence of COVID-19 within a two-week period to better manage the disease. Methods The COVID-19 datasets provided by Johns Hopkins University, contain information on COVID-19 cases in different geographic regions since January 22, 2020 and are updated daily. Data from 252 such regions were analyzed as of March 29, 2020, with 17,136 records and 4 variables, namely latitude, longitude, date, and records. In order to design the incidence pattern for each geographic region, the information was utilized on the region and its neighboring areas gathered 2 weeks prior to the designing. Then, a model was developed to predict the incidence rate for the coming 2 weeks via a Least-Square Boosting Classification algorithm. Results The model was presented for three groups based on the incidence rate: less than 200, between 200 and 1000, and above 1000. The mean absolute error of model evaluation were 4.71, 8.54, and 6.13%, respectively. Also, comparing the forecast results with the actual values in the period in question showed that the proposed model predicted the number of globally confirmed cases of COVID-19 with a very high accuracy of 98.45%. Conclusion Using data from different geographical regions within a country and discovering the pattern of prevalence in a region and its neighboring areas, our boosting-based model was able to accurately predict the incidence of COVID-19 within a two-week period

    Providing an imputation algorithm for missing values of longitudinal data using Cuckoo search algorithm: A case study on cervical dystonia

    No full text
    Background: Missing values in data are found in a large number of studies in the field of medical sciences, especially longitudinal ones, in which repeated measurements are taken from each person during the study. In this regard, several statistical endeavors have been performed on the concepts, issues, and theoretical methods during the past few decades. Methods: Herein, we focused on the missing data related to patients excluded from longitudinal studies. To this end, two statistical parameters of similarity and correlation coefficient were employed. In addition, metaheuristic algorithms were applied to achieve an optimal solution. The selected metaheuristic algorithm, which has a great search functionality, was the Cuckoo search algorithm. Results: Profiles of subjects with cervical dystonia (CD) were used to evaluate the proposed model after applying missingness. It was concluded that the algorithm used in this study had a higher accuracy (98.48%), compared with similar approaches. Conclusion: Concomitant use of similar parameters and correlation coefficients led to a significant increase in accuracy of missing data imputation

    Predictive model for survival in patients with gastric cancer

    No full text
    Background and aim: Gastric cancer is one of the most prevalent cancers in the world. Characterized by poor prognosis, it is a frequent cause of cancer in Iran. The aim of the study was to design a predictive model of survival time for patients suffering from gastric cancer. Methods: This was a historical cohort conducted between 2011 and 2016. Study population were 277 patients suffering from gastric cancer. Data were gathered from the Iranian Cancer Registry and the laboratory of Emam Reza Hospital in Mashhad, Iran. Patients or their relatives underwent interviews where it was needed. Missing values were imputed by data mining techniques. Fifteen factors were analyzed. Survival was addressed as a dependent variable. Then, the predictive model was designed by combining both genetic algorithm and logistic regression. Matlab 2014 software was used to combine them. Results: Of the 277 patients, only survival of 80 patients was available whose data were used for designing the predictive model. Mean ± SD of missing values for each patient was 4.43±1.41 combined predictive model achieved 72.57% accuracy. Sex, birth year, age at diagnosis time, age at diagnosis time of patients’ family, family history of gastric cancer, and family history of other gastrointestinal cancers were six parameters associated with patient survival. Conclusion: The study revealed that imputing missing values by data mining techniques have a good accuracy. And it also revealed six parameters extracted by genetic algorithm effect on the survival of patients with gastric cancer. Our combined predictive model, with a good accuracy, is appropriate to forecast the survival of patients suffering from Gastric cancer. So, we suggest policy makers and specialists to apply it for prediction of patients’ survival
    corecore