33 research outputs found

    The added value of text from Dutch general practitioner notes in predictive modeling

    Get PDF
    Objective:This work aims to explore the value of Dutch unstructured data, in combination with structured data, for the development of prognostic prediction models in a general practitioner (GP) setting.Materials and methods:We trained and validated prediction models for 4 common clinical prediction problems using various sparse text representations, common prediction algorithms, and observational GP electronic health record (EHR) data. We trained and validated 84 models internally and externally on data from different EHR systems.Results:On average, over all the different text representations and prediction algorithms, models only using text data performed better or similar to models using structured data alone in 2 prediction tasks. Additionally, in these 2 tasks, the combination of structured and text data outperformed models using structured or text data alone. No large performance differences were found between the different text representations and prediction algorithms.Discussion:Our findings indicate that the use of unstructured data alone can result in well-performing prediction models for some clinical prediction problems. Furthermore, the performance improvement achieved by combining structured and text data highlights the added value. Additionally, we demonstrate the significance of clinical natural language processing research in languages other than English and the possibility of validating text-based prediction models across various EHR systems.Conclusion:Our study highlights the potential benefits of incorporating unstructured data in clinical prediction models in a GP setting. Although the added value of unstructured data may vary depending on the specific prediction task, our findings suggest that it has the potential to enhance patient care

    Electrocardiographic Criteria for Left Ventricular Hypertrophy in Children

    Get PDF
    Previous studies to determine the sensitivity of the electrocardiogram (ECG) for left ventricular hypertrophy (LVH) in children had their imperfections: they were not done on an unselected hospital population, several criteria used in adults were not applied to children, and obsolete limits of normal for the ECG parameters were used. Furthermore, left ventricular mass (LVM) was taken as the reference standard for LVH, with no regard for other clinical evidence. The study population consisted of 832 children from whom a 12-lead ECG and an M-mode echocardiogram were taken on the same day. The validity of the ECG criteria was judged on the basis of an abnormal LVM index, either alone or in combination with other clinical evidence. The ECG criteria were based on recently established age-dependent normal limits. At 95% specificity, the ECG criteria have low sensitivities (<25%) when an elevated LVM index is taken as the reference for LVH. When clinical evidence is also taken into account, the sensitivity improved considerably (<43%). Sensitivities could be further improved when ECG parameters were combined. The sensitivity of the pediatric ECG in detecting LVH is low but depends strongly on the definition of the reference used for validation

    Baseline values from the electrocardiograms of children and adolescents with ADHD

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important issue in pediatric pharmacology is the determination of whether medications affect cardiac rhythm parameters, in particular the QT interval, which is a surrogate marker for the risk of adverse cardiac events and sudden death. To evaluate changes while on medication, it is useful to have a comparison of age appropriate values while off medication. The present meta-analysis provides baseline ECG values (i.e., off medication) from approximately 6000 children and adolescents with attention-deficit/hyperactivity disorder (ADHD).</p> <p>Methods</p> <p>Subjects were aged 6–18 years and participated in global trials within the atomoxetine registration program. Patients were administered a 12-lead ECG at study screening and cardiac rhythm parameters were recorded. Baseline QT intervals were corrected for heart rate using 3 different methods: Bazett's, Fridericia's, and a population data-derived formula.</p> <p>Results</p> <p>ECG data were obtained from 5289 North American and 641 non-North American children and adolescents. Means and percentiles are presented for each ECG measure and QTc interval based on pubertal status as defined by age and sex. Prior treatment history with stimulants and racial origin (Caucasian) were each associated with significantly longer mean QTc values.</p> <p>Conclusion</p> <p>Baseline ECG and QTc data from almost 6000 children and adolescents presenting with ADHD are provided to contribute to the knowledge base regarding mean values for pediatric cardiac parameters. Consistent with other studies of QT interval in children and adolescents, Bazett correction formula appears to overestimate the prevalence of prolonged QTc in the pediatric population.</p

    Dependency of magnetocardiographically determined fetal cardiac time intervals on gestational age, gender and postnatal biometrics in healthy pregnancies

    Get PDF
    BACKGROUND: Magnetocardiography enables the precise determination of fetal cardiac time intervals (CTI) as early as the second trimester of pregnancy. It has been shown that fetal CTI change in course of gestation. The aim of this work was to investigate the dependency of fetal CTI on gestational age, gender and postnatal biometric data in a substantial sample of subjects during normal pregnancy. METHODS: A total of 230 fetal magnetocardiograms were obtained in 47 healthy fetuses between the 15(th )and 42(nd )week of gestation. In each recording, after subtraction of the maternal cardiac artifact and the identification of fetal beats, fetal PQRST courses were signal averaged. On the basis of therein detected wave onsets and ends, the following CTI were determined: P wave, PR interval, PQ interval, QRS complex, ST segment, T wave, QT and QTc interval. Using regression analysis, the dependency of the CTI were examined with respect to gestational age, gender and postnatal biometric data. RESULTS: Atrioventricular conduction and ventricular depolarization times could be determined dependably whereas the T wave was often difficult to detect. Linear and nonlinear regression analysis established strong dependency on age for the P wave and QRS complex (r(2 )= 0.67, p < 0.001 and r(2 )= 0.66, p < 0.001) as well as an identifiable trend for the PR and PQ intervals (r(2 )= 0.21, p < 0.001 and r(2 )= 0.13, p < 0.001). Gender differences were found only for the QRS complex from the 31(st )week onward (p < 0.05). The influence on the P wave or QRS complex of biometric data, collected in a subgroup in whom recordings were available within 1 week of birth, did not display statistical significance. CONCLUSION: We conclude that 1) from approximately the 18(th )week to term, fetal CTI which quantify depolarization times can be reliably determined using magnetocardiography, 2) the P wave and QRS complex duration show a high dependency on age which to a large part reflects fetal growth and 3) fetal gender plays a role in QRS complex duration in the third trimester. Fetal development is thus in part reflected in the CTI and may be useful in the identification of intrauterine growth retardation

    Contextualising adverse events of special interest to characterise the baseline incidence rates in 24 million patients with COVID-19 across 26 databases: a multinational retrospective cohort study

    Get PDF
    BACKGROUND: Adverse events of special interest (AESIs) were pre-specified to be monitored for the COVID-19 vaccines. Some AESIs are not only associated with the vaccines, but with COVID-19. Our aim was to characterise the incidence rates of AESIs following SARS-CoV-2 infection in patients and compare these to historical rates in the general population. METHODS: A multi-national cohort study with data from primary care, electronic health records, and insurance claims mapped to a common data model. This study's evidence was collected between Jan 1, 2017 and the conclusion of each database (which ranged from Jul 2020 to May 2022). The 16 pre-specified prevalent AESIs were: acute myocardial infarction, anaphylaxis, appendicitis, Bell's palsy, deep vein thrombosis, disseminated intravascular coagulation, encephalomyelitis, Guillain- Barré syndrome, haemorrhagic stroke, non-haemorrhagic stroke, immune thrombocytopenia, myocarditis/pericarditis, narcolepsy, pulmonary embolism, transverse myelitis, and thrombosis with thrombocytopenia. Age-sex standardised incidence rate ratios (SIR) were estimated to compare post-COVID-19 to pre-pandemic rates in each of the databases. FINDINGS: Substantial heterogeneity by age was seen for AESI rates, with some clearly increasing with age but others following the opposite trend. Similarly, differences were also observed across databases for same health outcome and age-sex strata. All studied AESIs appeared consistently more common in the post-COVID-19 compared to the historical cohorts, with related meta-analytic SIRs ranging from 1.32 (1.05 to 1.66) for narcolepsy to 11.70 (10.10 to 13.70) for pulmonary embolism. INTERPRETATION: Our findings suggest all AESIs are more common after COVID-19 than in the general population. Thromboembolic events were particularly common, and over 10-fold more so. More research is needed to contextualise post-COVID-19 complications in the longer term. FUNDING: None

    European Health Data &amp; Evidence Network-learnings from building out a standardized international health data network

    Get PDF
    ObjectiveHealth data standardized to a common data model (CDM) simplifies and facilitates research. This study examines the factors that make standardizing observational health data to the Observational Medical Outcomes Partnership (OMOP) CDM successful.Materials and methodsTwenty-five data partners (DPs) from 11 countries received funding from the European Health Data Evidence Network (EHDEN) to standardize their data. Three surveys, DataQualityDashboard results, and statistics from the conversion process were analyzed qualitatively and quantitatively. Our measures of success were the total number of days to transform source data into the OMOP CDM and participation in network research.ResultsThe health data converted to CDM represented more than 133 million patients. 100%, 88%, and 84% of DPs took Surveys 1, 2, and 3. The median duration of the 6 key extract, transform, and load (ETL) processes ranged from 4 to 115 days. Of the 25 DPs, 21 DPs were considered applicable for analysis of which 52% standardized their data on time, and 48% participated in an international collaborative study.DiscussionThis study shows that the consistent workflow used by EHDEN proves appropriate to support the successful standardization of observational data across Europe. Over the 25 successful transformations, we confirmed that getting the right people for the ETL is critical and vocabulary mapping requires specific expertise and support of tools. Additionally, we learned that teams that proactively prepared for data governance issues were able to avoid considerable delays improving their ability to finish on time.ConclusionThis study provides guidance for future DPs to standardize to the OMOP CDM and participate in distributed networks. We demonstrate that the Observational Health Data Sciences and Informatics community must continue to evaluate and provide guidance and support for what ultimately develops the backbone of how community members generate evidence

    Applying machine learning in distributed data networks for pharmacoepidemiologic and pharmacovigilance studies: opportunities, challenges, and considerations

    No full text
    Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss the major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years

    A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data

    No full text
    Background and objective As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods We show step-by-step how to implement the analytics pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?’. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Results Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. Conclusion Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world
    corecore