2 research outputs found

    Abstract 1122‐000089: Characterization of Critical Sequelae in Ischemic Stroke Using Natural Language Processing

    No full text
    Introduction: Automated processing of electronic health data to classify complications of ischemic stroke serves numerous purposes, including improved electronic phenotyping for clinical research. Here, we present a natural language processing (NLP) approach to identify critical findings in acute ischemic stroke from unstructured radiology reports of computed tomography (CT) and magnetic resonance imaging (MRI). Methods: Text reports of CT and MRI scans taken from 2292 patients admitted for large (>1/2 middle cerebral artery territory), acute anterior circulation ischemic stroke were gathered from a single‐institution retrospective cohort. Reports were reviewed and labelled for the presence of hemorrhagic conversion, intracerebral edema, midline shift, intraventricular hemorrhage and parenchymal hematoma as defined by European Cooperative Acute Stroke Study PH1 and PH2 categories. For binary classifications, we quantified co‐occurrence of individual words within reports using two separate NLP methods: Bag‐of‐Words (BOW) and Term Frequency‐Inverse Document Frequency (TF‐IDF). We then trained Lasso regression, random forest, and neural network classifiers to predict all complications based on word co‐occurrence. Classifier performance was measured by area under receiver operating characteristic curves (AUC) using five separate folds of an internal test dataset. To predict midline shift as a continuous outcome, we developed a semantic rule‐based system (RBS) based on regular radiographic report expressions. This system was tested using an external validation dataset of 1472 acute large anterior circulation stroke reports from a separate hospital. Results: 2292 reports were fully labelled for the presence of all stroke complications. Lasso regression consistently displayed the best discrimination among all models. For BOW and TF‐IDF, Lasso yielded respective AUCs of 0.894 and 0.919 (hemorrhagic conversion), 0.935 and 0.950 (intracerebral edema), 0.968 and 0.963 (midline shift), 0.933 and 0.904 (intraventricular hemorrhage), and 0.873 and 0.879 (parenchymal hematoma). All models were well‐calibrated to underlying complication rates. The RBS also achieved strong performance in quantifying midline shift, achieving a mean absolute error (MAE) of 0.103 mm, sensitivity of 99.1% and specificity of 97.5% in the original cohort. In the external validation set of 1472 additional stroke reports, this same system achieved a MAE of 0.126 mm, sensitivity of 99.5% and specificity of 97.5% for midline shift. Wilcoxon rank sum testing on bootstrapped samples confirmed no statistically‐significant differences in RBS performance between institutions when comparing MAE (p = 0.918), sensitivity (p = 0.152), and specificity (p = 0.929). Conclusions: A machine learning pipeline based on Lasso regression successfully identified critical complications of large anterior circulation ischemic stroke from unstructured radiology reports, while our RBS quantified midline shift with a high degree of generalized accuracy between different institutions. We propose that these systems may warrant prospective validation in care settings and data mining for stroke research

    Natural language processing of radiology reports to detect complications of ischemic stroke

    No full text
    Background Abstraction of critical data from unstructured radiologic reports using natural language processing (NLP) is a powerful tool to automate the detection of important clinical features and enhance research efforts. We present a set of NLP approaches to identify critical findings in patients with acute ischemic stroke from radiology reports of computed tomography (CT) and magnetic resonance imaging (MRI). Methods We trained machine learning classifiers to identify categorical outcomes of edema, midline shift (MLS), hemorrhagic transformation, and parenchymal hematoma, as well as rule-based systems (RBS) to identify intraventricular hemorrhage (IVH) and continuous MLS measurements within CT/MRI reports. Using a derivation cohort of 2289 reports from 550 individuals with acute middle cerebral artery territory ischemic strokes, we externally validated our models on reports from a separate institution as well as from patients with ischemic strokes in any vascular territory. Results In all data sets, a deep neural network with pretrained biomedical word embeddings (BioClinicalBERT) achieved the highest discrimination performance for binary prediction of edema (area under precision recall curve [AUPRC] > 0.94), MLS (AUPRC > 0.98), hemorrhagic conversion (AUPRC > 0.89), and parenchymal hematoma (AUPRC > 0.76). BioClinicalBERT outperformed lasso regression (p  Conclusions Our study demonstrates robust performance and external validity of a core NLP tool kit for identifying both categorical and continuous outcomes of ischemic stroke from unstructured radiographic text data. Medically tailored NLP methods have multiple important big data applications, including scalable electronic phenotyping, augmentation of clinical risk prediction models, and facilitation of automatic alert systems in the hospital setting
    corecore