11 research outputs found

    A data science roadmap for open science organizations engaged in early-stage drug discovery

    Get PDF
    The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design

    A framework for future national pediatric pandemic respiratory disease severity triage: The HHS pediatric COVID-19 data challenge

    Get PDF
    Abstract Introduction: With persistent incidence, incomplete vaccination rates, confounding respiratory illnesses, and few therapeutic interventions available, COVID-19 continues to be a burden on the pediatric population. During a surge, it is difficult for hospitals to direct limited healthcare resources effectively. While the overwhelming majority of pediatric infections are mild, there have been life-threatening exceptions that illuminated the need to proactively identify pediatric patients at risk of severe COVID-19 and other respiratory infectious diseases. However, a nationwide capability for developing validated computational tools to identify pediatric patients at risk using real-world data does not exist. Methods: HHS ASPR BARDA sought, through the power of competition in a challenge, to create computational models to address two clinically important questions using the National COVID Cohort Collaborative: (1) Of pediatric patients who test positive for COVID-19 in an outpatient setting, who are at risk for hospitalization? (2) Of pediatric patients who test positive for COVID-19 and are hospitalized, who are at risk for needing mechanical ventilation or cardiovascular interventions? Results: This challenge was the first, multi-agency, coordinated computational challenge carried out by the federal government as a response to a public health emergency. Fifty-five computational models were evaluated across both tasks and two winners and three honorable mentions were selected. Conclusion: This challenge serves as a framework for how the government, research communities, and large data repositories can be brought together to source solutions when resources are strapped during a pandemic

    Increased Incidence of Vestibular Disorders in Patients With SARS-CoV-2

    Get PDF
    OBJECTIVE: Determine the incidence of vestibular disorders in patients with SARS-CoV-2 compared to the control population. STUDY DESIGN: Retrospective. SETTING: Clinical data in the National COVID Cohort Collaborative database (N3C). METHODS: Deidentified patient data from the National COVID Cohort Collaborative database (N3C) were queried based on variant peak prevalence (untyped, alpha, delta, omicron 21K, and omicron 23A) from covariants.org to retrospectively analyze the incidence of vestibular disorders in patients with SARS-CoV-2 compared to control population, consisting of patients without documented evidence of COVID infection during the same period. RESULTS: Patients testing positive for COVID-19 were significantly more likely to have a vestibular disorder compared to the control population. Compared to control patients, the odds ratio of vestibular disorders was significantly elevated in patients with untyped (odds ratio [OR], 2.39; confidence intervals [CI], 2.29-2.50; CONCLUSIONS: The incidence of vestibular disorders differed between COVID-19 variants and was significantly elevated in COVID-19-positive patients compared to the control population. These findings have implications for patient counseling and further research is needed to discern the long-term effects of these findings

    Search for Drosophila Genes based on Patterned Expression of Mini-White Reporter Gene of a P lacW Vector in Adult Eyes

    No full text
    Developmental expression of transduced mini-white(w) gene of Drosophila is sensitive to its flanking genomic enhancers. Taking advantage of this phenomenon, we mobilized a P lacW transposon and screened for new transposant lines which showed patterned expression of the mini-w gene in adult eyes. From a screen of about 1,000 independent P lacW transposant lines on the second chromosome, we identified 7 lines which showed patterned w expression in adult eyes. These P insertions were assigned to engrailed, wingless and teashirt genes based on their chromosomal locations, developmental expression of the lacZ reporter gene, lethal embryonic mutant phenotypes and, finally, their failure to complement the lethal alleles of the respective genetic loci. Our results show that although only a small fraction of the total transposant lines displayed patterned w expression, the genetic loci thus identified are those which play essential roles in pattern formation. Scopes of screens for genetic loci based on w reporter gene expression in adult eyes are discussed

    Privacy‐preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience

    No full text
    Abstract Introduction Research driven by real‐world clinical data is increasingly vital to enabling learning health systems, but integrating such data from across disparate health systems is challenging. As part of the NCATS National COVID Cohort Collaborative (N3C), the N3C Data Enclave was established as a centralized repository of deidentified and harmonized COVID‐19 patient data from institutions across the US. However, making this data most useful for research requires linking it with information such as mortality data, images, and viral variants. The objective of this project was to establish privacy‐preserving record linkage (PPRL) methods to ensure that patient‐level EHR data remains secure and private when governance‐approved linkages with other datasets occur. Methods Separate agreements and approval processes govern N3C data contribution and data access. The Linkage Honest Broker (LHB), an independent neutral party (the Regenstrief Institute), ensures data linkages are robust and secure by adding an extra layer of separation between protected health information and clinical data. The LHB's PPRL methods (including algorithms, processes, and governance) match patient records using “deidentified tokens,” which are hashed combinations of identifier fields that define a match across data repositories without using patients' clear‐text identifiers. Results These methods enable three linkage functions: Deduplication, Linking Multiple Datasets, and Cohort Discovery. To date, two external repositories have been cross‐linked. As of March 1, 2023, 43 sites have signed the LHB Agreement; 35 sites have sent tokens generated for 9 528 998 patients. In this initial cohort, the LHB identified 135 037 matches and 68 596 duplicates. Conclusion This large‐scale linkage study using deidentified datasets of varying characteristics established secure methods for protecting the privacy of N3C patient data when linked for research purposes. This technology has potential for use with registries for other diseases and conditions

    The completion of the Mammalian Gene Collection

    No full text
    Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

    Pre-Clinical Common Data Elements for Traumatic Brain Injury Research: Progress and Use Cases

    No full text
    Traumatic brain injury (TBI) is an extremely complex condition due to heterogeneity in injury mechanism, underlying conditions, and secondary injury. Pre-clinical and clinical researchers face challenges with reproducibility that negatively impact translation and therapeutic development for improved TBI patient outcomes. To address this challenge, TBI Pre-clinical Working Groups expanded upon previous efforts and developed common data elements (CDEs) to describe the most frequently used experimental parameters. The working groups created 913 CDEs to describe study metadata, animal characteristics, animal history, injury models, and behavioral tests. Use cases applied a set of commonly used CDEs to address and evaluate the degree of missing data resulting from combining legacy data from different laboratories for two different outcome measures (Morris water maze [MWM]; RotorRod/Rotarod). Data were cleaned and harmonized to Form Structures containing the relevant CDEs and subjected to missing value analysis. For the MWM dataset (358 animals from five studies, 44 CDEs), 50% of the CDEs contained at least one missing value, while for the Rotarod dataset (97 animals from three studies, 48 CDEs), over 60% of CDEs contained at least one missing value. Overall, 35% of values were missing across the MWM dataset, and 33% of values were missing for the Rotarod dataset, demonstrating both the feasibility and the challenge of combining legacy datasets using CDEs. The CDEs and the associated forms created here are available to the broader pre-clinical research community to promote consistent and comprehensive data acquisition, as well as to facilitate data sharing and formation of data repositories. In addition to addressing the challenge of standardization in TBI pre-clinical studies, this effort is intended to bring attention to the discrepancies in assessment and outcome metrics among pre-clinical laboratories and ultimately accelerate translation to clinical research

    Nonelective coronary artery bypass graft outcomes are adversely impacted by Coronavirus disease 2019 infection, but not altered processes of care: A National COVID Cohort Collaborative and National Surgery Quality Improvement Program analysisCentral MessagePerspective

    No full text
    Objective: The effects of Coronavirus disease 2019 (COVID-19) infection and altered processes of care on nonelective coronary artery bypass grafting (CABG) outcomes remain unknown. We hypothesized that patients with COVID-19 infection would have longer hospital lengths of stay and greater mortality compared with COVID-negative patients, but that these outcomes would not differ between COVID-negative and pre-COVID controls. Methods: The National COVID Cohort Collaborative 2020-2022 was queried for adult patients undergoing CABG. Patients were divided into COVID-negative, COVID-active, and COVID-convalescent groups. Pre-COVID control patients were drawn from the National Surgical Quality Improvement Program database. Adjusted analysis of the 3 COVID groups was performed via generalized linear models. Results: A total of 17,293 patients underwent nonelective CABG, including 16,252 COVID-negative, 127 COVID-active, 367 COVID-convalescent, and 2254 pre-COVID patients. Compared to pre-COVID patients, COVID-negative patients had no difference in mortality, whereas COVID-active patients experienced increased mortality. Mortality and pneumonia were higher in COVID-active patients compared to COVID-negative and COVID-convalescent patients. Adjusted analysis demonstrated that COVID-active patients had higher in-hospital mortality, 30- and 90-day mortality, and pneumonia compared to COVID-negative patients. COVID-convalescent patients had a shorter length of stay but a higher rate of renal impairment. Conclusions: Traditional care processes were altered during the COVID-19 pandemic. Our data show that nonelective CABG in patients with active COVID-19 is associated with significantly increased rates of mortality and pneumonia. The equivalent mortality in COVID-negative and pre-COVID patients suggests that pandemic-associated changes in processes of care did not impact CABG outcomes. Additional research into optimal timing of CABG after COVID infection is warranted
    corecore