11 research outputs found
The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.
OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.
MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.
RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.
CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19
Increased Incidence of Vestibular Disorders in Patients With SARS-CoV-2
OBJECTIVE: Determine the incidence of vestibular disorders in patients with SARS-CoV-2 compared to the control population.
STUDY DESIGN: Retrospective.
SETTING: Clinical data in the National COVID Cohort Collaborative database (N3C).
METHODS: Deidentified patient data from the National COVID Cohort Collaborative database (N3C) were queried based on variant peak prevalence (untyped, alpha, delta, omicron 21K, and omicron 23A) from covariants.org to retrospectively analyze the incidence of vestibular disorders in patients with SARS-CoV-2 compared to control population, consisting of patients without documented evidence of COVID infection during the same period.
RESULTS: Patients testing positive for COVID-19 were significantly more likely to have a vestibular disorder compared to the control population. Compared to control patients, the odds ratio of vestibular disorders was significantly elevated in patients with untyped (odds ratio [OR], 2.39; confidence intervals [CI], 2.29-2.50;
CONCLUSIONS: The incidence of vestibular disorders differed between COVID-19 variants and was significantly elevated in COVID-19-positive patients compared to the control population. These findings have implications for patient counseling and further research is needed to discern the long-term effects of these findings
Recommended from our members
Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative
Objective In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. Materials and Methods We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. Results Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. Discussion We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. Conclusion By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require
Recommended from our members
The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction
The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy.
In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients.
This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease
Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative
Objective
In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations.
Materials and Methods
We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements.
Results
Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback.
Discussion
We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate.
Conclusion
By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require
Recommended from our members
The National COVID Cohort Collaborative (N3C): Rationale, Design, Infrastructure, and Deployment
ObjectiveCoronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.Materials and methodsThe Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.ResultsOrganized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.ConclusionsThe N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19
Recommended from our members
Chronic Lung Disease as a Risk Factor for Long COVID in Patients Diagnosed With Coronavirus Disease 2019: A Retrospective Cohort Study
Abstract Background Patients with coronavirus disease 2019 (COVID-19) often experience persistent symptoms, known as postacute sequelae of COVID-19 or long COVID, after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Chronic lung disease (CLD) has been identified in small-scale studies as a potential risk factor for long COVID. Methods This large-scale retrospective cohort study using the National COVID Cohort Collaborative data evaluated the link between CLD and long COVID over 6 months after acute SARS-CoV-2 infection. We included adults (aged ≥18 years) who tested positive for SARS-CoV-2 during any of 3 SARS-CoV-2 variant periods and used logistic regression to determine the association, considering a comprehensive list of potential confounding factors, including demographics, comorbidities, socioeconomic conditions, geographical influences, and medication. Results Of 1 206 021 patients, 1.2% were diagnosed with long COVID. A significant association was found between preexisting CLD and long COVID (adjusted odds ratio [aOR], 1.36). Preexisting obesity and depression were also associated with increased long COVID risk (aOR, 1.32 for obesity and 1.29 for depression) as well as demographic factors including female sex (aOR, 1.09) and older age (aOR, 1.79 for age group 40–65 [vs 18–39] years and 1.56 for >65 [vs 18–39] years). Conclusions CLD is associated with higher odds of developing long COVID within 6 months after acute SARS-CoV-2 infection. These data have implications for identifying high-risk patients and developing interventions for long COVID in patients with CLD
Nonelective coronary artery bypass graft outcomes are adversely impacted by Coronavirus disease 2019 infection, but not altered processes of care: A National COVID Cohort Collaborative and National Surgery Quality Improvement Program analysisCentral MessagePerspective
Objective: The effects of Coronavirus disease 2019 (COVID-19) infection and altered processes of care on nonelective coronary artery bypass grafting (CABG) outcomes remain unknown. We hypothesized that patients with COVID-19 infection would have longer hospital lengths of stay and greater mortality compared with COVID-negative patients, but that these outcomes would not differ between COVID-negative and pre-COVID controls. Methods: The National COVID Cohort Collaborative 2020-2022 was queried for adult patients undergoing CABG. Patients were divided into COVID-negative, COVID-active, and COVID-convalescent groups. Pre-COVID control patients were drawn from the National Surgical Quality Improvement Program database. Adjusted analysis of the 3 COVID groups was performed via generalized linear models. Results: A total of 17,293 patients underwent nonelective CABG, including 16,252 COVID-negative, 127 COVID-active, 367 COVID-convalescent, and 2254 pre-COVID patients. Compared to pre-COVID patients, COVID-negative patients had no difference in mortality, whereas COVID-active patients experienced increased mortality. Mortality and pneumonia were higher in COVID-active patients compared to COVID-negative and COVID-convalescent patients. Adjusted analysis demonstrated that COVID-active patients had higher in-hospital mortality, 30- and 90-day mortality, and pneumonia compared to COVID-negative patients. COVID-convalescent patients had a shorter length of stay but a higher rate of renal impairment. Conclusions: Traditional care processes were altered during the COVID-19 pandemic. Our data show that nonelective CABG in patients with active COVID-19 is associated with significantly increased rates of mortality and pneumonia. The equivalent mortality in COVID-negative and pre-COVID patients suggests that pandemic-associated changes in processes of care did not impact CABG outcomes. Additional research into optimal timing of CABG after COVID infection is warranted
Evaluating COVID-19 vaccine effectiveness during pre-Delta, Delta and Omicron dominant periods among pregnant people in the U.S.: Retrospective cohort analysis from a nationally sampled cohort in National COVID Collaborative Cohort (N3C)
Objectives To evaluate the effectiveness of COVID-19 vaccinations (initial and booster) during pre-Delta, Delta and Omicron dominant periods among pregnant people via (1) COVID-19 incident and severe infections among pregnant people who were vaccinated versus unvaccinated and (2) post-COVID-19 vaccination breakthrough infections and severe infections among vaccinated females who were pregnant versus non-pregnant.Design Retrospective cohort study using nationally sampled electronic health records data from the National COVID Cohort Collaborative, 10 December 2020 –7 June 2022.Participants Cohort 1 included pregnant people (15–55 years) and cohort 2 included vaccinated females of reproductive age (15–55 years).Exposures (1) COVID-19 vaccination and (2) pregnancy.Main outcome measures Adjusted HRs (aHRs) for COVID-19 incident or breakthrough infections and severe infections (ie, COVID-19 infections with related hospitalisations).Results In cohort 1, 301 107 pregnant people were included. Compared with unvaccinated pregnant people, the aHRs for pregnant people with initial vaccinations during pregnancy of incident COVID-19 were 0.77 (95% CI 0.62 to 0.96) and 0.88 (95% CI 0.73 to 1.07) and aHRs of severe COVID-19 infections were 0.65 (95% CI 0.47 to 0.90) and 0.79 (95% CI 0.51 to 1.21) during the Delta and Omicron periods, respectively. Compared with pregnant people with full initial vaccinations, the aHR of incident COVID-19 for pregnant people with booster vaccinations was 0.64 (95% CI 0.58 to 0.71) during the Omicron period. In cohort 2, 934 337 vaccinated people were included. Compared with vaccinated non-pregnant females, the aHRs of severe COVID-19 infections for people with initial vaccinations during pregnancy was 2.71 (95% CI 1.31 to 5.60) during the Omicron periods.Conclusions Pregnant people with initial and booster vaccinations during pregnancy had a lower risk of incident and severe COVID-19 infections compared with unvaccinated pregnant people across the pandemic stages. However, vaccinated pregnant people still had a higher risk of severe infections compared with non-pregnant females