29 research outputs found

    Comparing penalization methods for linear models on large observational health data

    Get PDF
    Objective: This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation. Materials and Methods: We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedman's test and critical difference diagrams. Results: Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity. Conclusion:L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.</p

    Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data

    Get PDF
    Background: There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods: We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results: We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions: Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.</p

    Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

    Get PDF
    Objective: Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text. Methods: We utilized three approaches for text classification—search queries, semi-supervised learning, and supervised learning—to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database. Results: The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database. Conclusions: Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.</p

    Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

    Get PDF
    Objective: Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text. Methods: We utilized three approaches for text classification—search queries, semi-supervised learning, and supervised learning—to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database. Results: The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database. Conclusions: Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.</p

    An Empirical Comparison of Meta- and Mega-Analysis With Data From the ENIGMA Obsessive-Compulsive Disorder Working Group

    Get PDF
    Objective: Brain imaging communities focusing on different diseases have increasingly started to collaborate and to pool data to perform well-powered meta- and mega-analyses. Some methodologists claim that a one-stage individual-participant data (IPD) mega-analysis can be superior to a two-stage aggregated data meta-analysis, since more detailed computations can be performed in a mega-analysis. Before definitive conclusions regarding the performance of either method can be drawn, it is necessary to critically evaluate the methodology of, and results obtained by, meta- and mega-analyses.Methods: Here, we compare the inverse variance weighted random-effect meta-analysis model with a multiple linear regression mega-analysis model, as well as with a linear mixed-effects random-intercept mega-analysis model, using data from 38 cohorts including 3,665 participants of the ENIGMA-OCD consortium. We assessed the effect sizes and standard errors, and the fit of the models, to evaluate the performance of the different methods.Results: The mega-analytical models showed lower standard errors and narrower confidence intervals than the meta-analysis. Similar standard errors and confidence intervals were found for the linear regression and linear mixed-effects random-intercept models. Moreover, the linear mixed-effects random-intercept models showed better fit indices compared to linear regression mega-analytical models.Conclusions: Our findings indicate that results obtained by meta- and mega-analysis differ, in favor of the latter. In multi-center studies with a moderate amount of variation between cohorts, a linear mixed-effects random-intercept mega-analytical framework appears to be the better approach to investigate structural neuroimaging data

    Subcortical brain volume, regional cortical thickness, and cortical surface area across disorders: findings from the ENIGMA ADHD, ASD, and OCD Working Groups

    Get PDF
    Objective Attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder (ASD), and obsessive-compulsive disorder (OCD) are common neurodevelopmental disorders that frequently co-occur. We aimed to directly compare all three disorders. The ENIGMA consortium is ideally positioned to investigate structural brain alterations across these disorders. Methods Structural T1-weighted whole-brain MRI of controls (n=5,827) and patients with ADHD (n=2,271), ASD (n=1,777), and OCD (n=2,323) from 151 cohorts worldwide were analyzed using standardized processing protocols. We examined subcortical volume, cortical thickness and surface area differences within a mega-analytical framework, pooling measures extracted from each cohort. Analyses were performed separately for children, adolescents, and adults using linear mixed-effects models adjusting for age, sex and site (and ICV for subcortical and surface area measures). Results We found no shared alterations among all three disorders, while shared alterations between any two disorders did not survive multiple comparisons correction. Children with ADHD compared to those with OCD had smaller hippocampal volumes, possibly influenced by IQ. Children and adolescents with ADHD also had smaller ICV than controls and those with OCD or ASD. Adults with ASD showed thicker frontal cortices compared to adult controls and other clinical groups. No OCD-specific alterations across different age-groups and surface area alterations among all disorders in childhood and adulthood were observed. Conclusion Our findings suggest robust but subtle alterations across different age-groups among ADHD, ASD, and OCD. ADHD-specific ICV and hippocampal alterations in children and adolescents, and ASD-specific cortical thickness alterations in the frontal cortex in adults support previous work emphasizing neurodevelopmental alterations in these disorders

    Attention-based neural networks for clinical prediction modelling on electronic health records

    Get PDF
    Background Deep learning models have had a lot of success in various fields. However, on structured data they have struggled. Here we apply four state-of-the-art supervised deep learning models using the attention mechanism and compare against logistic regression and XGBoost using discrimination, calibration and clinical utility. Methods We develop the models using a general practitioners database. We implement a recurrent neural network, a transformer with and without reverse distillation and a graph neural network. We measure discrimination using the area under the receiver operating characteristic curve (AUC) and the area under the precision recall curve (AUPRC). We assess smooth calibration using restricted cubic splines and clinical utility with decision curve analysis. Results Our results show that deep learning approaches can improve discrimination up to 2.5% points AUC and 7.4% points AUPRC. However, on average the baselines are competitive. Most models are similarly calibrated as the baselines except for the graph neural network. The transformer using reverse distillation shows the best performance in clinical utility on two out of three prediction problems over most of the prediction thresholds. Conclusion In this study, we evaluated various approaches in supervised learning using neural networks and attention. Here we do a rigorous comparison, not only looking at discrimination but also calibration and clinical utility. There is value in using deep learning models on electronic health record data since it can improve discrimination and clinical utility while providing good calibration. However, good baseline methods are still competitive

    Changes in scalp potentials and spatial smoothing effects of inclusion of dura layer in human head models for EEG simulations

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked Files. This article is open access.The dura layer which covers the brain is less conductive than the CSF (cerebrospinal fluid) and also more conductive than the skull bone. This could significantly influence the flow of volume currents from cortex to the scalp surface which will also change the magnitude and spatial profiles of scalp potentials. This was examined with a 3-D finite element method (FEM) model of an adult subject constructed from 192 segmented axial magnetic resonance (MR) slices with 256×256 pixel resolution. The voxel resolution was 1×1×1 mm. The model included the dura layer. In addition, other major tissues were also identified. The electrical conductivities of various tissues were obtained from the literature. The conductivities of dura and CSF were 0.001 S/m and 0.06 S/m, respectively. The electrical activity of the cortex was represented by 144,000 distributed dipolar sources with orientations normal to the local cortical surface. The dipolar intensity was in the range of 0.0-0.4 mA meter with a uniform random distribution. Scalp potentials were simulated for two head models with an adaptive finite element solver. One model had the dura layer and in the other model, dura layer was replaced with the CSF. Spatial contour plots of potentials on the cortical surface, dural surface and the scalp surface were made. With the inclusion of the dura layer, scalp potentials decrease by about 20%. The contours of gyri and sulci structures were visible in the spatial profiles of the cortical potentials which were smoothed out on the dural surface and were not visible on the scalp surface. These results suggest that dura layer should be included for an accurate modeling of scalp and cortical potentials

    Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data

    Get PDF
    Abstract Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases

    Challenges of Estimating Global Feature Importance in Real-World Health Care Data

    Get PDF
    Feature importance is often used to explain clinical prediction models. In this work, we examine three challenges using experiments with electronic health record data: computational feasibility, choosing between methods, and interpretation of the resulting explanation. This work aims to create awareness of the disagreement between feature importance methods and underscores the need for guidance to practitioners how to deal with these discrepancies
    corecore