14 research outputs found

    Predicting preeclampsia and related risk factors using data mining approaches: A cross-sectional study

    Get PDF
    Background: Preeclampsia is a type of pregnancy hypertension disorder that has adverse effects on both the mother and the fetus. Despite recent advances in the etiology of preeclampsia, no adequate clinical screening tests have been identified to diagnose the disorder. Objective: We aimed to provide a model based on data mining approaches that can be used as a screening tool to identify patients with this syndrome and also to identify the risk factors associated with it. Materials and Methods: The data used to perform this cross-sectional study were extracted from the clinical records of 726 mothers with preeclampsia and 726 mothers without preeclampsia who were referred to Fatemieh Hospital in Hamadan City during April 2005–March 2015. In this study, six data mining methods were adopted, including logistic regression, k-nearest neighborhood, C5.0 decision tree, discriminant analysis, random forest, and support vector machine, and their performance was compared using the criteria of accuracy, sensitivity, and specificity. Results: Underlying condition, age, pregnancy season and the number of pregnancies were the most important risk factors for diagnosing preeclampsia. The accuracy of the models were as follows: logistic regression (0.713), k-nearest neighborhood (0.742), C5.0 decision tree (0.788), discriminant analysis (0.687), random forest (0.758) and support vector machine (0.791). Conclusion: Among the data mining methods employed in this study, support vector machine was the most accurate in predicting preeclampsia. Therefore, this model can be considered as a screening tool to diagnose this disorder. Key words: Preeclampsia, Random forest, C5.0 decision tree, Support vector machine, Logistic regression

    Forecasting New Cases of Bipolar Disorder Using Poisson Hidden Markov Model

    Get PDF
    Background: Bipolar disorder (BD) is a major public health problem. In time series count data there may be over dispersion, and serial dependency. In such situation some models that can consider the dependency are needed. The purpose current research was to use Poisson hidden Markov model to forecast new monthly BD instances.Methods: In current study the dataset including the frequency of new instances of BD from October 2008 to March 2015 in Hamadan Province, the west of Iran were used. We used Poisson hidden Markov with different number of conditions to determine the best model according to Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Then we used final model to forecast for the next 24 months.Results: Poisson hidden Markov with two states were chosen as the final model. Each component of dependent mixture model explained one of the states. The results showed that the new BD cases is increase over time and due to forecasting results number of patients for the next 24 months comforted in state two with mean 85.15. The forecast interval was approximately (56, 100).Conclusion: As the Poisson hidden Markov models was not used to forecast the future states in other prior researches, the findings of this study set forward a forecasting strategy as an alternative to common methods, by considering its deficiencies

    Multistate recursively imputed survival trees for time-to-event data analysis: an application to AIDS and mortality post-HIV infection data

    Get PDF
    Abstracts Background This study aimed to introduce recursively imputed survival trees into multistate survival models (MSRIST) to analyze these types of data and to identify the prognostic factors influencing the disease progression in patients with intermediate events. The proposed method is fully nonparametric and can be used for estimating transition probabilities. Methods A general algorithm was provided for analyzing multi-state data with a focus on the illness-death and progressive multi-state models. The model considered both beyond Markov and Non-Markov settings. We also proposed a multi-state random survival method (MSRSF) and compared their performance with the classical multi-state Cox model. We applied the proposed method to a dataset related to HIV/AIDS patients based on a retrospective cohort study extracted in Tehran from April 2004 to March 2014 consist of 2473 HIV-infected patients. Results The results showed that MSRIST outperformed the classical multistate method using Cox Model and MSRSF in terms of integrated Brier score and concordance index over 500 repetitions. We also identified a set of important risk factors as well as their interactions on different states of HIV and AIDS progression. Conclusions There are different strategies for modelling the intermediate event. We adapted two newly developed data mining technique (RSF and RIST) for multistate models (MSRSF and MSRIST) to identify important risk factors in different stages of the diseases. The methods can capture any complex relationship between variables and can be used as a useful tool for identifying important risk factors in different states of this disease

    Predicting the incidence of brucellosis in Western Iran using Markov switching model

    Get PDF
    Objective: Brucellosis is a zoonosis almost chronic disease. Brucellosis bacteria can remain in the environment for a long time. Thus, climate irregularities could pave the way for the survival of the bacterium brucellosis. Brucellosis is more common in men 25 to 29 years of age, in the western provinces, and in the spring months. The aim of this study is to investigate the efect of climatic factors as well as predicting the incidence of brucellosis in Qazvin prov‑ ince using the Markov switching model (MSM). This study is a secondary study of data collected from 2010 to 2019 in Qazvin province. The data include brucellosis cases and climatic parameters. Two state MSM with time lags of 0, 1 and 2 was ftted to the data. The Bayesian information criterion (BIC) was used to evaluate the models. Results: According to the BIC, the two‑state MSM with a 1‑month lag is a suitable model. The month, the average‑ wind‑speed, the minimum‑temperature have a positive efect on the number of brucellosis, the age and rainfall have a negative efect. The results show that the probability of an outbreak for the third month of 2019 is 0.30%

    Predicting the Survival Time for Bladder Cancer Using an Additive Hazards Model in Microarray Data

    Get PDF
    Background: One substantial part of microarray studies is to predict patients’ survival based on their gene expression profile. Variable selection techniques are powerful tools to handle high dimensionality in analysis of microarray data. However, these techniques have not been investigated in competing risks setting. This study aimed to investigate the performance of four sparse variable selection methods in estimating the survival time. Methods: The data included 1381 gene expression measurements and clinical information from 301 patients with bladder cancer operated in the years 1987 to 2000 in hospitals in Denmark, Sweden, Spain, France, and England. Four methods of the least absolute shrinkage and selection operator, smoothly clipped absolute deviation, the smooth integration of counting and absolute deviation and elastic net were utilized for simultaneous variable selection and estimation under an additive hazards model. The criteria of area under ROC curve, Brier score and c-index were used to compare the methods. Results: The median follow-up time for all patients was 47 months. The elastic net approach was indicated to outperform other methods. The elastic net had the lowest integrated Brier score (0.137±0.07) and the greatest median of the over-time AUC and C-index (0.803±0.06 and 0.779±0.13, respectively). Five out of 19 selected genes by the elastic net were significant (P<0.05) under an additive hazards model. It was indicated that the expression of RTN4, SON, IGF1R and CDC20 decrease the survival time, while the expression of SMARCAD1 increase it. Conclusion: The elastic net had higher capability than the other methods for the prediction of survival time in patients with bladder cancer in the presence of competing risks base on additive hazards model

    Forecasting Schizophrenia Incidence Frequencies Using Time Series Approach

    Get PDF
    Introduction: Understanding the prevalence of schizophrenia has important implications for both health service planning and risk factor epidemiology. The aims of this study are to systematically identify and collate studies describing the prevalence of schizophrenia, to summarize the findings of these studies, and to explore selected factors that may influence prevalence estimates.Methods: This historical cohort study was done on schizophrenia patients in Farshchian psychiatric hospital from April 2008 to April 2016. To analyze the data, the Holt-Winters Exponential Smoothing (HWES) method was applied. All the analyses were done by R.3.2.3. Software using the packages “forecast” and “tseries”. The statistical significant level was assumed as 0.05.Results: Our investigation show that a constant frequency of Schizophrenia incidence happens every month from August 2008 to February 2015 while a considerable increase occurs in March 2015. The high frequency of Schizophrenia incidence remains constant to the end of 2015 and a decrease is shown in 2016. Also, data demonstrate the development of Schizophrenia in the next 24 months with 95% confidence interval.Conclusion: Our study showed that a significant increase happens in the frequency of Schizophrenia from 2016. Although the development is not constant and the same for all months, the amount of increase is considerably high comparing to before 2016.

    High-Dimensional Additive Hazards Regression for Oral Squamous Cell Carcinoma Using Microarray Data: A Comparative Study

    No full text
    Microarray technology results in high-dimensional and low-sample size data sets. Therefore, fitting sparse models is substantial because only a small number of influential genes can reliably be identified. A number of variable selection approaches have been proposed for high-dimensional time-to-event data based on Cox proportional hazards where censoring is present. The present study applied three sparse variable selection techniques of Lasso, smoothly clipped absolute deviation and the smooth integration of counting, and absolute deviation for gene expression survival time data using the additive risk model which is adopted when the absolute effects of multiple predictors on the hazard function are of interest. The performances of used techniques were evaluated by time dependent ROC curve and bootstrap .632+ prediction error curves. The selected genes by all methods were highly significant (P<0.001). The Lasso showed maximum median of area under ROC curve over time (0.95) and smoothly clipped absolute deviation showed the lowest prediction error (0.105). It was observed that the selected genes by all methods improved the prediction of purely clinical model indicating the valuable information containing in the microarray features. So it was concluded that used approaches can satisfactorily predict survival based on selected gene expression measurements

    Multistate recursively imputed survival trees for time-to-event data analysis: an application to AIDS and mortality post-HIV infection data

    Get PDF
    Abstracts Background This study aimed to introduce recursively imputed survival trees into multistate survival models (MSRIST) to analyze these types of data and to identify the prognostic factors influencing the disease progression in patients with intermediate events. The proposed method is fully nonparametric and can be used for estimating transition probabilities. Methods A general algorithm was provided for analyzing multi-state data with a focus on the illness-death and progressive multi-state models. The model considered both beyond Markov and Non-Markov settings. We also proposed a multi-state random survival method (MSRSF) and compared their performance with the classical multi-state Cox model. We applied the proposed method to a dataset related to HIV/AIDS patients based on a retrospective cohort study extracted in Tehran from April 2004 to March 2014 consist of 2473 HIV-infected patients. Results The results showed that MSRIST outperformed the classical multistate method using Cox Model and MSRSF in terms of integrated Brier score and concordance index over 500 repetitions. We also identified a set of important risk factors as well as their interactions on different states of HIV and AIDS progression. Conclusions There are different strategies for modelling the intermediate event. We adapted two newly developed data mining technique (RSF and RIST) for multistate models (MSRSF and MSRIST) to identify important risk factors in different stages of the diseases. The methods can capture any complex relationship between variables and can be used as a useful tool for identifying important risk factors in different states of this disease

    High-Dimensional Additive Hazards Regression for Oral Squamous Cell Carcinoma Using Microarray Data: A Comparative Study

    No full text
    Microarray technology results in high-dimensional and low-sample size data sets. Therefore, fitting sparse models is substantial because only a small number of influential genes can reliably be identified. A number of variable selection approaches have been proposed for high-dimensional time-to-event data based on Cox proportional hazards where censoring is present. The present study applied three sparse variable selection techniques of Lasso, smoothly clipped absolute deviation and the smooth integration of counting, and absolute deviation for gene expression survival time data using the additive risk model which is adopted when the absolute effects of multiple predictors on the hazard function are of interest. The performances of used techniques were evaluated by time dependent ROC curve and bootstrap .632+ prediction error curves. The selected genes by all methods were highly significant ( &lt; 0.001). The Lasso showed maximum median of area under ROC curve over time (0.95) and smoothly clipped absolute deviation showed the lowest prediction error (0.105). It was observed that the selected genes by all methods improved the prediction of purely clinical model indicating the valuable information containing in the microarray features. So it was concluded that used approaches can satisfactorily predict survival based on selected gene expression measurements
    corecore