Prediction of low birth weight using Random Forest: A comparison with Logistic Regression

Abstract

Low birth weight (neonate weighing less than 2500 g) is associated with several maternal and fetal factors, all interrelated with each other [1]. This study is aimed to survey maternal risk factors associated with low birth weight neonates using data mining (Random Forest) to account for interactions between them. We also intended to compare Random Forest with traditional Logistic regression. The dataset used in the present study consisted of 600 volunteer pregnant women.  This cross-sectional study was carried out in Milad hospital, Tehran, during 2005-2009. Ten potential risk factors that are commonly associated with low birth weight were selected by using Random Forest technique. Several criteria such as the area under ROC curve were considered in comparing Random Forest with Logistic Regression.According to both criteria, four top rank variables identified by Random Forest were pregnancy age, body mass index during the third three months of pregnancy, mother’s age and body mass index during the first three months of pregnancy, respectively. In addition, in terms of different criteria the Random Forest technique outperformed the Logistic regression (area under ROC curve: 93% ; Total Accuracy:95% ; Kappa Coefficient: 66%).The results of the present study showed that using Random Forest improved the prediction of low birth weight compared with Logistic Regression. This is because of the fact that the former accounts for all interactions between covariates. Therefore, this approach is a promising classifier for predicting low birth weight

    Similar works