122 research outputs found

    TASKA: A modular task management system to support health research studies

    Get PDF

    TreatmentPatterns:An R package to facilitate the standardized development and analysis of treatment patterns across disease domains

    Get PDF
    Background and objectives: There is an increasing interest to use real-world data to illustrate how patients with specific medical conditions are treated in real life. Insight in the current treatment practices helps to improve and tailor patient care, but is often held back by a lack of data interoperability and a high-level of required resources. We aimed to provide an easy tool that overcomes these barriers to support the standardized development and analysis of treatment patterns for a wide variety of medical conditions. Methods: We formally defined the process of constructing treatment pathways and implemented this in an open-source R package TreatmentPatterns (https://github.com/mi-erasmusmc/TreatmentPatterns) to enable a reproducible and timely analysis of treatment patterns. Results: The developed package supports the analysis of treatment patterns of a study population of interest. We demonstrate the functionality of the package by analyzing the treatment patterns of three common chronic diseases (type II diabetes mellitus, hypertension, and depression) in the Dutch Integrated Primary Care Information (IPCI) database. Conclusion: TreatmentPatterns is a tool to make the analysis of treatment patterns more accessible, more standardized, and more interpretation friendly. We hope it thereby contributes to the accumulation of knowledge on real-world treatment patterns across disease domains. We encourage researchers to further adjust and add custom analysis to the R package based on their research needs.</p

    Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data

    Get PDF
    Background: There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods: We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results: We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions: Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.</p

    90-Day all-cause mortality can be predicted following a total knee replacement:an international, network study to develop and validate a prediction model

    Get PDF
    Purpose: The purpose of this study was to develop and validate a prediction model for 90-day mortality following a total knee replacement (TKR). TKR is a safe and cost-effective surgical procedure for treating severe knee osteoarthritis (OA). Although complications following surgery are rare, prediction tools could help identify high-risk patients who could be targeted with preventative interventions. The aim was to develop and validate a simple model to help inform treatment choices. Methods: A mortality prediction model for knee OA patients following TKR was developed and externally validated using a US claims database and a UK general practice database. The target population consisted of patients undergoing a primary TKR for knee OA, aged ≥ 40 years and registered for ≥ 1 year before surgery. LASSO logistic regression models were developed for post-operative (90-day) mortality. A second mortality model was developed with a reduced feature set to increase interpretability and usability. Results: A total of 193,615 patients were included, with 40,950 in The Health Improvement Network (THIN) database and 152,665 in Optum. The full model predicting 90-day mortality yielded AUROC of 0.78 when trained in OPTUM and 0.70 when externally validated on THIN. The 12 variable model achieved internal AUROC of 0.77 and external AUROC of 0.71 in THIN. Conclusions: A simple prediction model based on sex, age, and 10 comorbidities that can identify patients at high risk of short-term mortality following TKR was developed that demonstrated good, robust performance. The 12-feature mortality model is easily implemented and the performance suggests it could be used to inform evidence based shared decision-making prior to surgery and targeting prophylaxis for those at high risk. Level of evidence: III.</p

    Advancing the use of real world evidence in health technology assessment:insights from a multi-stakeholder workshop

    Get PDF
    Introduction: Real-world evidence (RWE) in health technology assessment (HTA) holds significant potential for informing healthcare decision-making. A multistakeholder workshop was organised by the European Health Data and Evidence Network (EHDEN) and the GetReal Institute to explore the status, challenges, and opportunities in incorporating RWE into HTA, with a focus on learning from regulatory initiatives such as the European Medicines Agency (EMA) Data Analysis and Real World Interrogation Network (DARWIN EU ®). Methods: The workshop gathered key stakeholders from regulatory agencies, HTA organizations, academia, and industry for three panel discussions on RWE and HTA integration. Insights and recommendations were collected through panel discussions and audience polls. The workshop outcomes were reviewed by authors to identify key themes, challenges, and recommendations. Results: The workshop discussions revealed several important findings relating to the use of RWE in HTA. Compared with regulatory processes, its adoption in HTA to date has been slow. Barriers include limited trust in RWE, data quality concerns, and uncertainty about best practices. Facilitators include multidisciplinary training, educational initiatives, and stakeholder collaboration, which could be facilitated by initiatives like EHDEN and the GetReal Institute. Demonstrating the impact of “driver projects” could promote RWE adoption in HTA. Conclusion: To enhance the integration of RWE in HTA, it is crucial to address known barriers through comprehensive training, stakeholder collaboration, and impactful exemplar research projects. By upskilling users and beneficiaries of RWE and those that generate it, promoting collaboration, and conducting “driver projects,” can strengthen the HTA evidence base for more informed healthcare decisions.</p

    Using the data quality dashboard to improve the ehden network

    Get PDF
    Federated networks of observational health databases have the potential to be a rich resource to inform clinical practice and regulatory decision making. However, the lack of standard data quality processes makes it difficult to know if these data are research ready. The EHDEN COVID-19 Rapid Collaboration Call presented the opportunity to assess how the newly developed open-source tool Data Quality Dashboard (DQD) informs the quality of data in a federated network. Fifteen Data Partners (DPs) from 10 different countries worked with the EHDEN taskforce to map their data to the OMOP CDM. Throughout the process at least two DQD results were collected and compared for each DP. All DPs showed an improvement in their data quality between the first and last run of the DQD. The DQD excelled at helping DPs identify and fix conformance issues but showed less of an impact on completeness and plausibility checks. This is the first study to apply the DQD on multiple, disparate databases across a network. While study-specific checks should still be run, we recommend that all data holders converting their data to the OMOP CDM use the DQD as it ensures conformance to the model specifications and that a database meets a baseline level of completeness and plausibility for use in research.</p

    Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

    Get PDF
    Objective: Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text. Methods: We utilized three approaches for text classification—search queries, semi-supervised learning, and supervised learning—to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database. Results: The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database. Conclusions: Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.</p

    Blood pressure measurements for diagnosing hypertension in primary care:room for improvement

    Get PDF
    Background: In the adult population, about 50% have hypertension, a risk factor for cardiovascular disease and subsequent premature death. Little is known about the quality of the methods used to diagnose hypertension in primary care. Objectives: The objective was to assess the frequency of use of recognized methods to establish a diagnosis of hypertension, and specifically for OBPM, whether three distinct measurements were taken, and how correctly the blood pressure levels were interpreted. Methods: A retrospective population-based cohort study using electronic medical records of patients aged between 40 and 70 years, who visited their general practitioner (GP) with a new-onset of hypertension in the years 2012, 2016, 2019, and 2020. A visual chart review of the electronic medical records was used to assess the methods employed to diagnose hypertension in a random sample of 500 patients. The blood pressure measurement method was considered complete if three or more valid office blood pressure measurements (OBPM) were performed, or home-based blood pressure measurements (HBPM), the office- based 30-minute method (OBP30), or 24-hour ambulatory blood pressure measurements (24 H-ABPM) were used. Results: In all study years, OBPM was the most frequently used method to diagnose new-onset hypertension in patients. The OBP-30 method was used in 0.4% (2012), 4.2% (2016), 10.6% (2019), and 9.8% (2020) of patients respectively, 24 H-ABPM in 16.0%, 22.2%, 17.2%, and 19.0% of patients and HBPM measurements in 5.4%, 8.4%, 7.6%, and 7.8% of patients, respectively. A diagnosis of hypertension based on only one or two office measurements occurred in 85.2% (2012), 87.9% (2016), 94.4% (2019), and 96.8% (2020) of all patients with OBPM. In cases of incomplete measurement and incorrect interpretation, medication was still started in 64% of cases in 2012, 56% (2016), 60% (2019), and 73% (2020). Conclusion: OBPM is still the most often used method to diagnose hypertension in primary care. The diagnosis was often incomplete or misinterpreted using incorrect cut-off levels. A small improvement occurred between 2012 and 2016 but no further progress was seen in 2019 or 2020. If hypertension is inappropriately diagnosed, it may result in under treatment or in prolonged, unnecessary treatment of patients. There is room for improvement in the general practice setting.</p

    Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

    Get PDF
    Objective: Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text. Methods: We utilized three approaches for text classification—search queries, semi-supervised learning, and supervised learning—to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database. Results: The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database. Conclusions: Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.</p

    European Health Data and Evidence Network—learnings from building out a standardized international health data network

    Get PDF
    Objective: Health data standardized to a common data model (CDM) simplifies and facilitates research. This study examines the factors that make standardizing observational health data to the Observational Medical Outcomes Partnership (OMOP) CDM successful. Materials and methods: Twenty-five data partners (DPs) from 11 countries received funding from the European Health Data Evidence Network (EHDEN) to standardize their data. Three surveys, DataQualityDashboard results, and statistics from the conversion process were analyzed qualitatively and quantitatively. Our measures of success were the total number of days to transform source data into the OMOP CDM and participation in network research. Results: The health data converted to CDM represented more than 133 million patients. 100%, 88%, and 84% of DPs took Surveys 1, 2, and 3. The median duration of the 6 key extract, transform, and load (ETL) processes ranged from 4 to 115 days. Of the 25 DPs, 21 DPs were considered applicable for analysis of which 52% standardized their data on time, and 48% participated in an international collaborative study. Discussion: This study shows that the consistent workflow used by EHDEN proves appropriate to support the successful standardization of observational data across Europe. Over the 25 successful transformations, we confirmed that getting the right people for the ETL is critical and vocabulary mapping requires specific expertise and support of tools. Additionally, we learned that teams that proactively prepared for data governance issues were able to avoid considerable delays improving their ability to finish on time. Conclusion: This study provides guidance for future DPs to standardize to the OMOP CDM and participate in distributed networks. We demonstrate that the Observational Health Data Sciences and Informatics community must continue to evaluate and provide guidance and support for what ultimately develops the backbone of how community members generate evidence
    corecore