3 research outputs found

    Building a best-in-class automated de-identification tool for electronic health records through ensemble learning

    No full text
    Summary: The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data. Detected identifiers are then transformed into plausible, though fictional, surrogates to further obfuscate any leaked identifier. Our approach outperforms existing tools, with a recall of 0.992 and precision of 0.979 on the i2b2 2014 dataset and a recall of 0.994 and precision of 0.967 on a dataset of 10,000 notes from the Mayo Clinic. The de-identification system presented here enables the generation of de-identified patient data at the scale required for modern machine-learning applications to help accelerate medical discoveries. The bigger picture: Clinical notes in electronic health records convey rich historical information regarding disease and treatment progression. However, this unstructured text often contains personally identifiable information such as names, phone numbers, or residential addresses of patients, thereby limiting its dissemination for research purposes. The removal of patient identifiers, through the process of de-identification, enables sharing of clinical data while preserving patient privacy. Here, we present a best-in-class approach to de-identification, which automatically detects identifiers and substitutes them with fabricated ones. Our approach enables de-identification of patient data at the scale required to harness the unstructured, context-rich information in electronic health records to aid in medical research and advancement

    Mapping each pre-existing condition’s association to short-term and long-term COVID-19 complications

    No full text
    Abstract Understanding the relationships between pre-existing conditions and complications of COVID-19 infection is critical to identifying which patients will develop severe disease. Here, we leverage ~1.1 million clinical notes from 1803 hospitalized COVID-19 patients and deep neural network models to characterize associations between 21 pre-existing conditions and the development of 20 complications (e.g. respiratory, cardiovascular, renal, and hematologic) of COVID-19 infection throughout the course of infection (i.e. 0–30 days, 31–60 days, and 61–90 days). Pleural effusion was the most frequent complication of early COVID-19 infection (89/1803 patients, 4.9%) followed by cardiac arrhythmia (45/1803 patients, 2.5%). Notably, hypertension was the most significant risk factor associated with 10 different complications including acute respiratory distress syndrome, cardiac arrhythmia, and anemia. The onset of new complications after 30 days is rare and most commonly involves pleural effusion (31–60 days: 11 patients, 61–90 days: 9 patients). Lastly, comparing the rates of complications with a propensity-matched COVID-negative hospitalized population confirmed the importance of hypertension as a risk factor for early-onset complications. Overall, the associations between pre-COVID conditions and COVID-associated complications presented here may form the basis for the development of risk assessment scores to guide clinical care pathways

    Cerebral venous sinus thrombosis (CVST) is not significantly linked to COVID-19 vaccines or non-COVID vaccines in a large multi-state US health system

    No full text
    Cerebral venous sinus thrombosis (CVST) has been reported in a small number of individuals who have received the mRNA vaccines1 or the adenoviral vector vaccines for COVID-19 in the US2 and Europe3. Continued pharmacovigilance is integral to mitigating the risk of rare adverse events that clinical trials are underpowered to detect, however, these anecdotal reports have led to the pause or withdrawal of some vaccines in many jurisdictions and exacerbated vaccine hesitancy at a critical moment in the fight against the COVID-19 pandemic. We investigated the frequencies of CVST seen among individuals who received FDA-authorized COVID-19 vaccines from Pfizer-BioNTech (n = 94,818 doses), Moderna (n = 36,350 doses) and Johnson & Johnson - J&J (n = 1,745 doses), and among individuals receiving one of 10 FDA-approved non-COVID-19 vaccines (n = 771,805 doses). Comparing the incidence rates of CVST in 30-day time windows before and after vaccination, we found no statistically significant differences for the COVID-19 vaccines or any other vaccines studied in this population. In total, we observed 3 cases of CVST within the 30 days following Pfizer-BioNTech vaccination (2 females, 1 male; Ages (years): [79, 80, 84]), including one individual with a prior history of thrombosis and another individual with recent trauma in the past 30 days. We did not observe any cases of CVST among the patients receiving Moderna or J&J vaccines in this study population. We further found the baseline CVST incidence in the study population between 2017 and 2021 to be 45 to 98 per million patient years. Overall, this real-world evidence-based study highlights that CVST is rare and is not significantly associated with COVID-19 vaccination. In addition, there is a need for a concerted international effort to monitor EHR data across diverse patient populations and to investigate the underlying biological mechanisms leading to these rare clotting events
    corecore