BACKGROUND:
Fetal hypoxia during labour is characterised by an insufficient oxygen supply in the womb during active uterine contractions. Although this condition represents a normal physiological compensatory response, certain infants are unable to adapt, resulting in severe consequences such as cerebral palsy, developmental disorders, and neonatal mortality. Cardiotocography (CTG) is a device that records fetal heart rate (FHR) and uterine contractions (UC), generating a graphical representation of these measurements. Clinicians utilise this non-invasive CTG to monitor alterations in FHR in response to UC, thereby identifying fetuses at risk of hypoxia during labour. However, human factors may compromise the quality and consistency of CTG interpretation. Previous research has indicated an increase in the caesarean section rate without corresponding improvement in the incidence of cerebral palsy. Machine learning (ML) has demonstrated the potential for detecting hypoxic fetuses using CTG data. Nonetheless, the majority of studies have relied on the same open-access dataset, and the absence of external validation and inconsistent hypoxia surrogate measures impedes clinical application. Moreover, although pregnancy risk factors can influence fetal hypoxia during labour, there is a paucity of studies employing ML in this domain. Consequently, the objective of this thesis was to develop ML prediction models for fetal hypoxia using CTG and pregnancy risk factors.
METHODS:
A scoping review was conducted to examine how existing CTG prediction models were studied and developed. In this thesis, CTG data from the UK and Czech Republic were compared. This comparison used exploratory data analysis (EDA) to find differences in CTG patterns between healthy and hypoxic fetuses. This thesis is the first to use Apgar scores as the gold standard for hypoxia. Statistical tests were performed to better understand CTG signal characteristics between hypoxic and normal fetuses. The review also helped to build and validate the CTG-ML models. For pregnancy risk factors modelling, EDA was performed on US pregnancy health records, and ML models were used to predict fetal hypoxia during labour. Logistic regression, a common tool, was used to determine the odds of different pregnancy risk factors. All ML models were checked using various metrics, such as misclassification errors, AUROC, Area Under the Precision-Recall Curve, Brier score, and calibration plots.
RESULTS:
The scoping review revealed that none of the previous studies incorporated UC when modelling FHR despite clinical recommendations. Additional gaps identified included the use of varying benchmarks for hypoxia surrogate markers, inconsistent analysis of CTG characteristics, and a lack of population generalisability owing to reliance on the same open-access CTG database. For CTG modelling, data were available for 4,909 women. The proportion of low Apgar scores was 2.1% for the UK dataset and 3.4% for the open-access dataset, respectively. In the external dataset, extreme gradient boosting with under-sampling and feature selection achieved the highest recall (sensitivity) of 0.95 and an area under the receiver operating characteristic curve (AUROC) of 0.59. However, the other metrics were suboptimal, with precision and F1 scores below 0.10. In the modelling of pregnancy risk factors, the study included 13,823,214 women, with low Apgar scores comprising 1.3% of the entire dataset. The odds ratio indicated a significant association with smoking before pregnancy, a finding that has not been previously reported. There were no differences in the prediction accuracy among the various data enrichment methods. The best results were obtained using multilayer perceptron, a type of neural network and extreme gradient boosting, with a recall (sensitivity) of 0.66 and an AUROC of 0.64. In addition, the precision and F1 score were below 0.05.
CONCLUSION:
The overall performance of the CTG and pregnancy risk models was suboptimal, as ML was unable to effectively differentiate between low and normal Apgar scores. Although statistical differences were observed in variables between groups, these differences do not necessarily translate into distinct separability in ML modelling. This limitation may be attributed to overlapping features between groups and/or small effect sizes. Furthermore, the subjective nature of the Apgar score evaluation renders it unsuitable as a benchmark for assessing fetal hypoxia using ML
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.