Search CORE

10 research outputs found

Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools

Author: Kors Jan A.
Rijnbeek Peter R.
Seinen Tom M.
van Mulligen Erik M.
Publication venue
Publication date: 27/06/2024
Field of study

Objective To explore the feasibility of validating Dutch concept extraction tools using annotated corpora translated from English, focusing on preserving annotations during translation and addressing the scarcity of non-English annotated clinical corpora.Materials and Methods Three annotated corpora were standardized and translated from English to Dutch using 2 machine translation services, Google Translate and OpenAI GPT-4, with annotations preserved through a proposed method of embedding annotations in the text before translation. The performance of 2 concept extraction tools, MedSpaCy and MedCAT, was assessed across the corpora in both Dutch and English.Results The translation process effectively generated Dutch annotated corpora and the concept extraction tools performed similarly in both English and Dutch. Although there were some differences in how annotations were preserved across translations, these did not affect extraction accuracy. Supervised MedCAT models consistently outperformed unsupervised models, whereas MedSpaCy demonstrated high recall but lower precision.Discussion Our validation of Dutch concept extraction tools on corpora translated from English was successful, highlighting the efficacy of our annotation preservation method and the potential for efficiently creating multilingual corpora. Further improvements and comparisons of annotation preservation techniques and strategies for corpus synthesis could lead to more efficient development of multilingual corpora and accurate non-English concept extraction tools.Conclusion This study has demonstrated that translated English corpora can be used to validate non-English concept extraction tools. The annotation preservation method used during translation proved effective, and future research can apply this corpus translation method to additional languages and clinical settings

EUR Research Repository

Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

Author: Fridgeirsson Egill A.
Kors Jan A.
Rijnbeek Peter R.
Seinen Tom M.
van Mulligen Erik M.
Verhamme Katia MC
Publication venue
Publication date: 01/09/2024
Field of study

Objective: Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text. Methods: We utilized three approaches for text classification—search queries, semi-supervised learning, and supervised learning—to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database. Results: The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database. Conclusions: Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.</p

EUR Research Repository

Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

Author: Fridgeirsson Egill A.
Kors Jan A.
Rijnbeek Peter R.
Seinen Tom M.
van Mulligen Erik M.
Verhamme Katia MC
Publication venue
Publication date: 01/09/2024
Field of study

EUR Research Repository

Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review

Author: de Ridder Maria A J
Ioannou Solomon
John Luis H
Kors Jan A
Markus Aniek F
Rekkas Alexandros
Rijnbeek Peter R
Seinen Tom M
Williams Ross D
Yang Cynthia
Publication venue
Publication date: 19/01/2022
Field of study

OBJECTIVES: This systematic review aims to provide further insights into the conduct and reporting of clinical prediction model development and validation over time. We focus on assessing the reporting of information necessary to enable external validation by other investigators.MATERIALS AND METHODS: We searched Embase, Medline, Web-of-Science, Cochrane Library, and Google Scholar to identify studies that developed 1 or more multivariable prognostic prediction models using electronic health record (EHR) data published in the period 2009-2019.RESULTS: We identified 422 studies that developed a total of 579 clinical prediction models using EHR data. We observed a steep increase over the years in the number of developed models. The percentage of models externally validated in the same paper remained at around 10%. Throughout 2009-2019, for both the target population and the outcome definitions, code lists were provided for less than 20% of the models. For about half of the models that were developed using regression analysis, the final model was not completely presented.DISCUSSION: Overall, we observed limited improvement over time in the conduct and reporting of clinical prediction model development and validation. In particular, the prediction problem definition was often not clearly reported, and the final model was often not completely presented.CONCLUSION: Improvement in the reporting of information necessary to enable external validation by other investigators is still urgently needed to increase clinical adoption of developed models.</p

ZENODO

EUR Research Repository

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Increased Bone Marrow Uptake and Accumulation of Very-Late Antigen-4 Targeted Lipid Nanoparticles

Author: Fens Marcel H A M
Heidenreich Olaf
Hendriksen Martijn
Kooijmans Sander
Krippner-Heidenreich Anja
Mata Casimiro L Daniel
Nelson Ryan
O'Toole Tom
Schiffelers Raymond M
Schweighart Elizabeth
Seinen Cor
Swart Laura E
Tuk David
van den Brink Luca
van Oort Anita
Waranecki Piotr
Publication venue
Publication date: 01/06/2023
Field of study

Lipid nanoparticles (LNPs) have evolved rapidly as promising delivery systems for oligonucleotides, including siRNAs. However, current clinical LNP formulations show high liver accumulation after systemic administration, which is unfavorable for the treatment of extrahepatic diseases, such as hematological disorders. Here we describe the specific targeting of LNPs to hematopoietic progenitor cells in the bone marrow. Functionalization of the LNPs with a modified Leu-Asp-Val tripeptide, a specific ligand for the very-late antigen 4 resulted in an improved uptake and functional siRNA delivery in patient-derived leukemia cells when compared to their non-targeted counterparts. Moreover, surface-modified LNPs displayed significantly improved bone-marrow accumulation and retention. These were associated with increased LNP uptake by immature hematopoietic progenitor cells, also suggesting similarly improved uptake by leukemic stem cells. In summary, we describe an LNP formulation that successfully targets the bone marrow including leukemic stem cells. Our results thereby support the further development of LNPs for targeted therapeutic interventions for leukemia and other hematological disorders

Utrecht University Repository

Use of unstructured text in prognostic clinical prediction models: a systematic review

Author: Fridgeirsson Egill A.
Ioannou Solomon
Jeannetot Daniel
John Luis H.
Kors Jan A.
Markus Aniek F.
Pera Victor
Rekkas Alexandros
Rijnbeek Peter R.
Seinen Tom M.
Van Mulligen Erik M.
Williams Ross D.
Yang Cynthia
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/04/2022
Field of study

OBJECTIVE: This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance. MATERIALS AND METHODS: We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic prediction models using information extracted from unstructured text in a data-driven manner, published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-analysis of the model performance was carried out to assess the added value of text to structured-data models. RESULTS: We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared with using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and attention for the explainability of the developed models were limited. CONCLUSION: The use of unstructured text in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The text data are source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice

ZENODO

PubMed Central

EUR Research Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Seek COVER: using a disease proxy to rapidly develop and validate a personalized risk calculator for COVID-19 outcomes in an international network

Author: Abrahão Maria T. F.
An Min H.
Aragón María
Areia Carlos
Burn Edward
Choi Young H.
Drakos Iannis
Duarte-Salles Talita
DuVall Scott L.
Falconer Thomas
Fernández-Bertolín Sergio
Hripcsak George
Jonnagaddala Jitendra
Kaas-Hansen Benjamin S.
Kandukuri Prasanna L.
Kim Chungsoo
Kors Jan A.
Kostka Kristin
Liaw Siaw-Teng
Lynch Kristine E.
Machado Amanda A.
Machnicki Gerardo
Markus Aniek F.
Matheny Michael E.
Morales Daniel
Nyberg Fredrik
Park Rae W.
Prats-Uribe Albert
Pratt Nicole
Prieto-Alhambra Daniel
Rao Gowtham
Reich Christian G.
Reps Jenna M.
Rho Yeunsook
Rijnbeek Peter R.
Rivera Marcela
Ryan Patrick B.
Seinen Tom
Shoaibi Azza
Spotnitz Matthew E.
Steyerberg Ewout W.
Suchard Marc A.
Williams Andrew E.
Williams Ross D.
Yang Cynthia
You Seng C.
Zhang Lin
Zhou Lili
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2022
Field of study

Background We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient’s risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients. Methods We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date. Results Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69–0.81, COVER-I: 0.73–0.91, and COVER-F: 0.72–0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations. Conclusions This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use

Columbia University Academic Commons

Seek COVER: using a disease proxy to rapidly develop and validate a personalized risk calculator for COVID-19 outcomes in an international network

Author: Abrahao Maria Tereza Fernandes
An Min Ho
Aragon Maria
Areia Carlos
Burn Edward
Choi Young Hwa
Drakos Iannis
Duarte-Salles Talita
DuVall Scott L.
Falconer Thomas
Fernandez-Bertolin Sergio
Hripcsak George
Jonnagaddala Jitendra
Kaas-Hansen Benjamin Skov
Kandukuri Prasanna L.
Kim Chungsoo
Kors Jan A.
Kostka Kristin
Liaw Siaw-Teng
Lynch Kristine E.
Machado Amanda Alberga
Machnicki Gerardo
Markus Aniek F.
Matheny Michael E.
Morales Daniel
Nyberg Fredrik
Park Rae Woong
Prats-Uribe Albert
Pratt Nicole
Prieto-Alhambra Daniel
Rao Gowtham
Reich Christian G.
Reps Jenna M.
Rho Yeunsook
Rijnbeek Peter R.
Rivera Marcela
Ryan Patrick B.
Seinen Tom
Shoaibi Azza
Spotnitz Matthew E.
Steyerberg Ewout W.
Suchard Marc A.
Williams Andrew E.
Williams Ross D.
Yang Cynthia
You Seng Chan
Zhang Lin
Zhou Lili
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/01/2022
Field of study

Background: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient’s risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients. Methods: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date. Results: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69–0.81, COVER-I: 0.73–0.91, and COVER-F: 0.72–0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations. Conclusions: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use

Recommended from our members

Seek COVER: using a disease proxy to rapidly develop and validate a personalized risk calculator for COVID-19 outcomes in an international network

Author: Abrahão Maria T. F.
An Min H.
Aragón María
Areia Carlos
Burn Edward
Choi Young H.
Drakos Iannis
Duarte-Salles Talita
DuVall Scott L.
Falconer Thomas
Fernández-Bertolín Sergio
Hripcsak George
Jonnagaddala Jitendra
Kaas-Hansen Benjamin S.
Kandukuri Prasanna L.
Kim Chungsoo
Kors Jan A.
Kostka Kristin
Liaw Siaw-Teng
Lynch Kristine E.
Machado Amanda A.
Machnicki Gerardo
Markus Aniek F.
Matheny Michael E.
Morales Daniel
Nyberg Fredrik
Park Rae W.
Prats-Uribe Albert
Pratt Nicole
Prieto-Alhambra Daniel
Rao Gowtham
Reich Christian G.
Reps Jenna M.
Rho Yeunsook
Rijnbeek Peter R.
Rivera Marcela
Ryan Patrick B.
Seinen Tom
Shoaibi Azza
Spotnitz Matthew E.
Steyerberg Ewout W.
Suchard Marc A.
Williams Andrew E.
Williams Ross D.
Yang Cynthia
You Seng C.
Zhang Lin
Zhou Lili
Publication venue
Publication date: 01/01/2022
Field of study

Columbia University Academic Commons

ZENODO

PubMed Central

Copenhagen University Research Information System

EUR Research Repository

eScholarship - University of California

Oxford University Research Archive

Leiden University Scholary Publications

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of Dundee Online Publications