Prediction of Human Transcriptional Biomarkers for Severe Infection with SARS-CoV-2

Abstract

Defining the human host factors associated with severe vs mild COVID-19 cases in infected individuals has become of increasing interest. Mining large numbers of public gene expression datasets is an effective way to identify genes that contribute to a given phenotype. Combining RNA-sequencing data with the associated clinical metadata describing disease severity can enable earlier identification of those patients who are at higher risk of developing severe COVID-19 disease. We consequently identified 356 public RNA-seq human transcriptome samples from the Gene Expression Omnibus database that had disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to quantify gene expression in each patient. This process involved using Salmon to map the reads to the reference transcriptomes, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then applied a machine learning algorithm to the read counts data to identify features that best differentiated samples based on COVID-19 severity phenotype. Ultimately, we produced a ranked list of genes based on their Gini importance values that includes GIMAP7 and S1PR2, which are associated with immunity and inflammation (respectively). We expect that these results can establish a groundwork foundation to improve the development of improved prognostics for severe COVID-19

    Similar works