187 research outputs found

    The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment

    Get PDF
    OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19

    HiMAL: Multimodal Hierarchical Multi-task Auxiliary Learning framework for predicting Alzheimer\u27s disease progression

    Get PDF
    OBJECTIVE: We aimed to develop and validate a novel multimodal framework MATERIALS AND METHODS: HiMAL utilized multimodal longitudinal visit data including imaging features, cognitive assessment scores, and clinical variables from MCI patients in the Alzheimer\u27s Disease Neuroimaging Initiative dataset, to predict at each visit if an MCI patient will progress to AD within the next 6 months. Performance of HiMAL was compared with state-of-the-art single-task and multitask baselines using area under the receiver operator curve (AUROC) and precision recall curve (AUPRC) metrics. An ablation study was performed to assess the impact of each input modality on model performance. Additionally, longitudinal explanations regarding risk of disease progression were provided to interpret the predicted cognitive decline. RESULTS: Out of 634 MCI patients (mean [IQR] age: 72.8 [67-78], 60% male), 209 (32%) progressed to AD. HiMAL showed better prediction performance compared to all state-of-the-art longitudinal single-modality singe-task baselines (AUROC = 0.923 [0.915-0.937]; AUPRC = 0.623 [0.605-0.644]; all DISCUSSION: Clinically informative model explanations anticipate cognitive decline 6 months in advance, aiding clinicians in future disease progression assessment. HiMAL relies on routinely collected electronic health records (EHR) variables for proximal (6 months) prediction of AD onset, indicating its translational potential for point-of-care monitoring and managing of high-risk patients

    Better together: Integrating biomedical informatics and healthcare IT operations to create a learning health system during the COVID-19 pandemic

    Get PDF
    The growing availability of multi-scale biomedical data sources that can be used to enable research and improve healthcare delivery has brought about what can be described as a healthcare data age. This new era is defined by the explosive growth in bio-molecular, clinical, and population-level data that can be readily accessed by researchers, clinicians, and decision-makers, and utilized for systems-level approaches to hypothesis generation and testing as well as operational decision-making. However, taking full advantage of these unprecedented opportunities presents an opportunity to revisit the alignment between traditionally academic biomedical informatics (BMI) and operational healthcare information technology (HIT) personnel and activities in academic health systems. While the history of the academic field of BMI includes active engagement in the delivery of operational HIT platforms, in many contemporary settings these efforts have grown distinct. Recent experiences during the COVID-19 pandemic have demonstrated greater coordination of BMI and HIT activities that have allowed organizations to respond to pandemic-related changes more effectively, with demonstrable and positive impact as a result. In this position paper, we discuss the challenges and opportunities associated with driving alignment between BMI and HIT, as viewed from the perspective of a learning healthcare system. In doing so, we hope to illustrate the benefits of coordination between BMI and HIT in terms of the quality, safety, and outcomes of care provided to patients and populations, demonstrating that these two groups can be better together

    A protocol to evaluate RNA sequencing normalization methods

    Get PDF
    Background RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements. However, the comparative efficacy of the various normalization techniques has not been tested in a standardized manner. Here we propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization. Results In this study we present two tests to assess the validity of normalization methods applied to a large-scale data set collected for systematic evaluation purposes. We tested various RNASeq normalization procedures and concluded that transcripts per million (TPM) was the best performing normalization method based on its preservation of biological signal as compared to the other methods tested. Conclusion Normalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments. More work, however, needs to be performed to optimize normalization methods for RNASeq data. The present effort helps pave the way for more systematic evaluations of normalization methods across different platforms. With our proposed schema researchers can evaluate their own or future normalization methods to further improve the field of RNASeq normalization

    Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: A systematic literature review

    Get PDF
    OBJECTIVE: Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia. MATERIALS AND METHODS: We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus. RESULTS: There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values). DISCUSSION: Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research

    Extraction of clinical phenotypes for Alzheimer\u27s disease dementia from clinical notes using natural language processing

    Get PDF
    OBJECTIVES: There is much interest in utilizing clinical data for developing prediction models for Alzheimer\u27s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. MATERIALS AND METHODS: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. RESULTS: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen\u27s kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline\u27s performance (average F1-score = 0.65-0.99) for each phenotype. DISCUSSION: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. CONCLUSION: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability

    OpenSep: A generalizable open source pipeline for SOFA score calculation and Sepsis-3 classification

    Get PDF
    EHR-based sepsis research often uses heterogeneous definitions of sepsis leading to poor generalizability and difficulty in comparing studies to each other. We have developed OpenSep, an open-source pipeline for sepsis phenotyping according to the Sepsis-3 definition, as well as determination of time of sepsis onset and SOFA scores. The Minimal Sepsis Data Model was developed alongside the pipeline to enable the execution of the pipeline to diverse sources of electronic health record data. The pipeline\u27s accuracy was validated by applying it to the MIMIC-IV version 1.0 data and comparing sepsis onset and SOFA scores to those produced by the pipeline developed by the curators of MIMIC. We demonstrated high reliability between both the sepsis onsets and SOFA scores, however the use of the Minimal Sepsis Data model developed for this work allows our pipeline to be applied to more broadly to data sources beyond MIMIC

    Impact of Risk-based Sexually Transmitted Infection Screening in the Emergency Department

    Get PDF
    OBJECTIVES: Sexually transmitted infections (STIs), including chlamydia, gonorrhea, and human immunodeficiency virus (HIV) pose a significant health burden in adolescents. Many adolescents receiving care in the emergency department (ED) are in need of testing, regardless of their chief complaint. Our objective was to determine whether an electronic, risk-based STI screening program in our ED was associated with an increase in STI testing among at-risk adolescents. METHODS: We conducted a retrospective cohort analysis of patient outcomes in our pediatric ED after integrating an Audio-enhanced Computer-Assisted Self-Interview (ACASI) as standard of care. It obtained a focused sexual history and generated STI testing recommendations. Patient answers and testing recommendations were integrated in real-time into the electronic health record. Patients who tested positive received treatment according to our standard-of-care practices. All patients 15-21 years of age were asked to complete this on an opt-out basis, regardless of the reason for their ED visit. Exclusions included those unable to independently use a tablet, severe illness, sexual assault, or non-English speaking. Our primary outcome was to describe STI-testing recommendations and test results among ACASI participants. We also compared STI testing between ACASI participants and those who were eligible but did not use it. RESULTS: In the first 13 months, 28.9% (1788/6194) of eligible adolescents completed the ACASI and 44.2% (321/790) accepted recommended STI testing. The mean age of participants was 16.6 ± 1.3 years, with 65.4% (1169) being female. Gonorrhea/chlamydia testing was significantly higher among participants vs. non-participants (20.1% [359/1788] vs 4.8% [212/4406]; p \u3c 0.0001). The proportion of positive STI tests was similar between the two groups: 24.8% (89/359) vs. 24.5% (52/212; p = 0.94) were positive for chlamydia and/or gonorrhea, while 0.6% (2/354) participants vs. 0% non-participants (p \u3e 0.99) were positive for HIV. Among participants whose chief complaints were unlikely to be related to STIs but accepted recommended testing, 20.9% (37/177) were positive for gonorrhea or chlamydia. CONCLUSIONS: Our program facilitated STI testing in the ED and identified many adolescents with STIs, even when their ED complaint was for unrelated reasons. More rigorous implementation is needed to determine the impact of deployment of ACASI to all eligible adolescents and addressing barriers to accepting STI testing recommendations
    corecore