Search CORE

12 research outputs found

De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository

Author: Basford Melissa
Chute Christopher G
Crosskey Miles
Gangireddy Srushti
Girvin Andrew T
Haendel Melissa
Harris Paul A
Kerchberger V Eric
Lunt Chris
Master Hiral
Moffitt Richard A
N3C and RECOVER Consortia
Pfaff Emily R
Wei Wei-Qi
Weiner Mark
Publication venue: Oxford University Press
Publication date: 01/01/2023
Field of study

Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH’s All of Us study partnered to reproduce the output of N3C’s trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics

Carolina Digital Repository

Long COVID risk and pre-COVID vaccination in an EHR-based cohort study from the RECOVER program

Author: Brannock M. Daniel
Chew Robert F.
Chute Christopher G.
Crosskey Miles
Funk Michele Jonsson
Girvin Andrew T.
Hadley Emily C.
Haendel Melissa A.
Leese Peter J.
McMurry Julie A.
Moffitt Richard A.
Pfaff Emily R.
Preiss Alexander J.
Redfield Signe
Zhou Andrea G.
Publication venue
Publication date: 01/01/2023
Field of study

Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID—a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)—to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients’ data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history., The extent to which COVID-19 vaccination protects against long COVID is not well understood. Here, the authors use electronic health record data from the United States and find that, for people who received their vaccination prior to infection, vaccination was associated with lower incidence of long COVID

Carolina Digital Repository

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Author: Amor Benjamin
Austin Christopher P
Bennett Tellen D
Blacketer Clair
Bradford Robert L
Chute Christopher G
Cimino James J
Clark Marshall
Colmenares Evan W
Eichmann David A
Francis Patricia A
Gabriel Davera
Gersing Ken R
Girvin Andrew T
Graves Alexis
Guinney Justin
Haendel Melissa A
Hemadri Raju
Hong Stephanie S
Hripscak George
Jiao Dazhi
Kibbe Warren A
Klann Jeffrey G
Kostka Kristin
Kurilla Michael G
Lee Adam M
Lehmann Harold P
Lingrey Lora
Manna Amin
Michael Sam G
Miller Robert T
Morris Michele
Murphy Shawn N
Natarajan Karthik
Palchuk Matvey B
Payne Philip R O
Pfaff Emily R
Portilla Lili M
Qureshi Nabeel
Robinson Peter N
Rutter Joni L
Saltz Joel H
Sheikh Usman
Solbrig Harold
Spratt Heidi
Suver Christine
Visweswaran Shyam
Walden Anita
Walters Kellie M
Weber Griffin M
Wilbanks John
Wilcox Adam B
Williams Andrew E
Wu Chunlei
Zhang Xiaohan Tanner
Zhu Richard L
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/03/2021
Field of study

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19

The Jackson Laboratory: The Mouseion at the JAXlibrary

Recommended from our members

Time-reversal symmetrization of spontaneous emission for quantum state transfer

Author: Gambetta JM
Girvin SM
Houck Andrew A
Liu Y
Sadri D
Srinivasan SJ
Sundaresan NM
Yu T
Publication venue: 'American Physical Society (APS)'
Publication date: 31/03/2014
Field of study

We demonstrate the ability to control spontaneous emission from a superconducting qubit coupled to a cavity. The time domain profile of the emitted photon is shaped into a symmetric truncated exponential. The experiment is enabled by a qubit coupled to a cavity, with a coupling strength that can be tuned in tens of nanoseconds while maintaining a constant dressed state emission frequency. Symmetrization of the photonic wave packet will enable use of photons as flying qubits for transferring the quantum state between atoms in distant cavities

Princeton University Open Access Repository

Who has long-COVID? A big data approach [preprint]

Author: Bennett Tellen D.
Bhatia Abhishek
Brooks Ian M.
Chute Christopher G.
Deer Rachel R.
Dekermanjian Jonathan P.
Girvin Andrew T.
Haendel Melissa A.
Jolley Sarah Elizabeth
Kahn Michael G.
Kostka Kristin
McMurry Julie A.
Moffitt Richard
Pfaff Emily R.
The N3C Consortium
Walden Anita
Publication venue: eScholarship@UMassChan
Publication date: 22/10/2021
Field of study

Background Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as long-COVID, have severely impacted recovery from the pandemic for patients and society alike. This new disease is characterized by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous long-COVID definition. Electronic health record (EHR) studies are a critical element of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, which is addressing the urgent need to understand PASC, accurately identify who has PASC, and identify treatments. Methods Using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models to identify potential long-COVID patients. We examined demographics, healthcare utilization, diagnoses, and medications for 97,995 adult COVID-19 patients. We used these features and 597 long-COVID clinic patients to train three ML models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized. Findings Our models identified potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve of 0.91 (all patients), 0.90 (hospitalized); and 0.85 (non-hospitalized). Important features include rate of healthcare utilization, patient age, dyspnea, and other diagnosis and medication information available within the EHR. Applying the “all patients” model to the larger N3C cohort identified 100,263 potential long-COVID patients. Interpretation Patients flagged by our models can be interpreted as “patients likely to be referred to or seek care at a long-COVID specialty clinic,” an essential proxy for long-COVID diagnosis in the current absence of a definition. We also achieve the urgent goal of identifying potential long-COVID patients for clinical trials. As more data sources are identified, the models can be retrained and tuned based on study needs. Funding This study was funded by NCATS and NIH through the RECOVER Initiative

eScholarship@UMMS

Recommended from our members

Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative

Author: Amor Benjamin
Bissell Mark
Bradwell Katie R.
Chute Christopher G.
Gabriel Davera L.
Girvin Andrew T.
Gold Sigfried
Haendel Melissa A.
Hong Stephanie S.
Kostka Kristin
Lehmann Harold P.
Loomba Johanna
Manna Amin
McMurry Julie A.
Moffitt Richard A.
Morris Michele
N3c Consortium
Niehaus Emily
Palchuk Matvey B.
Pfaff Emily R.
Qureshi Nabeel
Walden Anita
Zhang Xiaohan Tanner
Zhu Richard L.
Publication venue: Oxford Univ Press
Publication date: 15/03/2022
Field of study

Objective In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. Materials and Methods We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. Results Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. Discussion We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. Conclusion By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require

University of Miami: Scholarship Miami

Recommended from our members

The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction

The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy. In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients. This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease

University of Miami: Scholarship Miami

A Methodological Framework for the Comparative Evaluation of Multiple Imputation Methods: Multiple Imputation of Race, Ethnicity and Body Mass Index in the U.S. National COVID Cohort Collaborative

While electronic health records are a rich data source for biomedical research, these systems are not implemented uniformly across healthcare settings and significant data may be missing due to healthcare fragmentation and lack of interoperability between siloed electronic health records. Considering that the deletion of cases with missing data may introduce severe bias in the subsequent analysis, several authors prefer applying a multiple imputation strategy to recover the missing information. Unfortunately, although several literature works have documented promising results by using any of the different multiple imputation algorithms that are now freely available for research, there is no consensus on which MI algorithm works best. Beside the choice of the MI strategy, the choice of the imputation algorithm and its application settings are also both crucial and challenging. In this paper, inspired by the seminal works of Rubin and van Buuren, we propose a methodological framework that may be applied to evaluate and compare several multiple imputation techniques, with the aim to choose the most valid for computing inferences in a clinical research work. Our framework has been applied to validate, and extend on a larger cohort, the results we presented in a previous literature study, where we evaluated the influence of crucial patients' descriptors and COVID-19 severity in patients with type 2 diabetes mellitus whose data is provided by the National COVID Cohort Collaborative Enclave

arXiv.org e-Print Archive

Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative

Author: Adams William G
Al-Shukri Shaymaa
Amor Benjamin
Anzalone Alfred
Baghal Ahmad
Bennett Tellen D
Bernstam Elmer V
Bernstam Elmer V
Bissell Mark
Bissell Mark M
Bradwell Katie R
Bush Brian
Campion Thomas R
Castro Victor
Chang Jack
Chaudhari Deepa D
Chen Wenjin
Chu San
Chute Christopher G
Cimino James J
Crandall Keith A
Crooks Mark
Davies Sara J Deakyne
DiPalazzo John
Dorr David
Eckrich Dan
Eltinge Sarah E
Fort Daniel G
Gabriel Davera L
Girvin Andrew T
Gold Sigfried
Golovko George
Gupta Snehil
Haendel Melissa A
Hong Stephanie S
Kostka Kristin
Lehmann Harold P
Loomba Johanna
Manna Amin
McMurry Julie A
Moffitt Richard A
Morris Michele
N3C Consortium
Niehaus Emily
Palchuk Matvey B
Pfaff Emily R
Qureshi Nabeel
Walden Anita
Zhang Xiaohan Tanner
Zhu Richard L
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/11/2021
Field of study

Recommended from our members

The National COVID Cohort Collaborative (N3C): Rationale, Design, Infrastructure, and Deployment

Author: Amor Benjamin
Austin Christopher P
Bennett Tellen D
Bennett Tellen D
Blacketer Clair
Blacketer Clair
Bradford Robert L
Bradford Robert L
Chute Christopher G
Chute Christopher G
Cimino James J
Cimino James J
Clark Marshall
Clark Marshall
Colmenares Evan W
Colmenares Evan W
Eichmann David A
Eichmann David A
Francis Patricia A
Francis Patricia A
Gabriel Davera
Gabriel Davera
Gersing Ken R
Girvin Andrew T
Graves Alexis
Graves Alexis
Guinney Justin
Guinney Justin
Haendel Melissa A
Haendel Melissa A
Hemadri Raju
Hemadri Raju
Hong Stephanie S
Hong Stephanie S
Hripscak George
Hripscak George
Jiao Dazhi
Jiao Dazhi
Kibbe Warren A
Kibbe Warren A
Klann Jeffrey G
Klann Jeffrey G
Kostka Kristin
Kostka Kristin
Kurilla Michael G
Lee Adam M
Lee Adam M
Lehmann Harold P
Lehmann Harold P
Lingrey Lora
Lingrey Lora
Manna Amin
Michael Sam G
Miller Robert T
Miller Robert T
Morris Michele
Morris Michele
Murphy Shawn N
Murphy Shawn N
Natarajan Karthik
Natarajan Karthik
Palchuk Matvey B
Palchuk Matvey B
Payne Philip RO
Payne Philip RO
Pfaff Emily R
Pfaff Emily R
Portilla Lili M
Qureshi Nabeel
Robinson Peter N
Robinson Peter N
Rutter Joni L
Saltz Joel H
Saltz Joel H
Sheikh Usman
Sheikh Usman
Solbrig Harold
Solbrig Harold
Spratt Heidi
Spratt Heidi
Suver Christine
Suver Christine
Visweswaran Shyam
Visweswaran Shyam
Walden Anita
Walden Anita
Walters Kellie M
Walters Kellie M
Weber Griffin M
Weber Griffin M
Wilbanks John
Wilbanks John
Wilcox Adam B
Wilcox Adam B
Williams Andrew E
Williams Andrew E
Wu Chunlei
Wu Chunlei
Zhang Xiaohan Tanner
Zhu Richard L
Publication venue: eScholarship, University of California
Publication date: 01/03/2021
Field of study

ObjectiveCoronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.Materials and methodsThe Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.ResultsOrganized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.ConclusionsThe N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19

eScholarship - University of California