76 research outputs found
Influenza research database: an integrated bioinformatics resource for influenza research and surveillance.
BackgroundThe recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics.DesignThe Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly-accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user-friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in-protected 'workbench' spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature.ResultsTo demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross-protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks.ConclusionsThe IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics
ViPR: an open bioinformatics database and analysis resource for virology research
The Virus Pathogen Database and Analysis Resource (ViPR, www.ViPRbrc.org) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR contains information for human pathogenic viruses belonging to the Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Flaviviridae, Filoviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae and Togaviridae families, with plans to support additional virus families in the future. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. Analytical and visualization tools for metadata-driven statistical sequence analysis, multiple sequence alignment, phylogenetic tree construction, BLAST comparison and sequence variation determination are also provided. Data filtering and analysis workflows can be combined and the results saved in personal ‘Workbenches’ for future use. ViPR tools and data are available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses
Recombination in West Nile Virus: minimal contribution to genomic diversity
Recombination is known to play a role in the ability of various viruses to acquire sequence diversity. We consequently examined all available West Nile virus (WNV) whole genome sequences both phylogenetically and with a variety of computational recombination detection algorithms. We found that the number of distinct lineages present on a phylogenetic tree reconstruction to be identical to the 6 previously reported. Statistically-significant evidence for recombination was only observed in one whole genome sequence. This recombination event was within the NS5 polymerase coding region. All three viruses contributing to the recombination event were originally isolated in Africa at various times, with the major parent (SPU116_89_B), minor parent (KN3829), and recombinant sequence (AnMg798) belonging to WNV taxonomic lineages 2, 1a, and 2 respectively. This one isolated recombinant genome was out of a total of 154 sequences analyzed. It therefore does not seem likely that recombination contributes in any significant manner to the overall sequence variation within the WNV genome
A comprehensive collection of systems biology data characterizing the host response to viral infection
The Systems Biology for Infectious Diseases Research program was established by the U.S. National Institute of Allergy and Infectious Diseases to investigate host-pathogen interactions at a systems level. This program generated 47 transcriptomic and proteomic datasets from 30 studies that investigate in vivo and in vitro host responses to viral infections. Human pathogens in the Orthomyxoviridae and Coronaviridae families, especially pandemic H1N1 and avian H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus (SARS-CoV), were investigated. Study validation was demonstrated via experimental quality control measures and meta-analysis of independent experiments performed under similar conditions. Primary assay results are archived at the GEO and PeptideAtlas public repositories, while processed statistical results together with standardized metadata are publically available at the Influenza Research Database (www.fludb.org) and the Virus Pathogen Resource (www.viprbrc.org). By comparing data from mutant versus wild-type virus and host strains, RNA versus protein differential expression, and infection with genetically similar strains, these data can be used to further investigate genetic and physiological determinants of host responses to viral infection
The contribution of different mechanisms of viral sequence variation to the evolution of positive-sense single-stranded RNA viruses
The Flaviviridae family of positive-sense single-stranded RNA (+ssRNA) viruses includes viral taxa which greatly impact public health worldwide. To explore how the viruses within the Flaviviridae family evolve, we examined the extent to which these viral taxa use nucleotide covariance, spontaneous mutation, and/or homologous recombination to vary their genotype as well as the resulting phenotype. We developed and used CovarView to assist us in simultaneously viewing and inspecting the results from whole genome covariance analyses. This resulted in the identification of previously-characterized RNA functional structures in the genomes of hepatitis C virus (HCV), as well as a new RNA functional region in the gp120 coding region of human immunodeficiency virus type 1 (HIV-1). We observed two distinct clades within HCV subtype 1a genomes when we used phylogeny to examine the prevalence of mutations in this species. These clades were further characterized and many nucleotide positions that contributed significantly to the separation between the two clades were identified. Several of these positions were located at or near sites responsible for encoding antiviral resistance mutations. While assessing the homologous recombination results for these species we found that HCV and DENV use it most frequently, while the novel individual event that we found in West Nile virus (WNV) confirms its rarity. We compared and contrasted the results measuring the separate variation mechanisms to determine the extent to which the viral genera and species within the Flaviviridae family use one or more of these mechanisms more frequently than other(s) to obtain sequence variation. We observed that HCV frequently exhibits nucleotide covariance while DENV and WNV use it only at the 5’ and 3’ untranslated regions. Although mutations occur in all Flaviviridae species, including DENV and WNV, they seem to be tolerated better by HCV due to its replicating within a single host. Homologous recombination is used customarily within HCV and DENV genomes, but is extraordinarily rare in WNV. We conclude that although +ssRNA viruses use nucleotide covariance, mutation, and homologous recombination to acquire sequence variation, different viral species use these mechanisms in varying frequency as they continue to evolve within their own distinct environments. Keywords: covariance, mutation, recombination, bioinformatics, virology, +ssRNA viru
Transcriptomics secondary analysis of severe human infection with SARS-CoV-2 identifies gene expression changes and predicts three transcriptional biomarkers in leukocytes
SARS-CoV-2 is the causative agent of COVID-19, which has greatly affected human health since it first emerged. Defining the human factors and biomarkers that differentiate severe SARS-CoV-2 infection from mild infection has become of increasing interest to clinicians. To help address this need, we retrieved 269 public RNA-seq human transcriptome samples from GEO that had qualitative disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to calculate gene expression in PBMCs, whole blood, and leukocytes, as well as to predict transcriptional biomarkers in PBMCs and leukocytes. This process involved using Salmon for read mapping, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then performed a random forest machine learning analysis on the read counts data to identify genes that best classified samples based on the COVID-19 severity phenotype. This approach produced a ranked list of leukocyte genes based on their Gini values that includes TGFBI, TTYH2, and CD4, which are associated with both the immune response and inflammation. Our results show that these three genes can potentially classify samples with severe COVID-19 with accuracy of ∼88% and an area under the receiver operating characteristic curve of 92.6--indicating acceptable specificity and sensitivity. We expect that our findings can help contribute to the development of improved diagnostics that may aid in identifying severe COVID-19 cases, guide clinical treatment, and improve mortality rates
Mutation in Hemagglutinin Antigenic Sites in Influenza A pH1N1 Viruses from 2015–2019 in the United States Mountain West, Europe, and the Northern Hemisphere
H1N1 influenza A virus is a respiratory pathogen that undergoes antigenic shift and antigenic drift to improve viral fitness. Tracking the evolutionary trends of H1N1 aids with the current detection and the future response to new viral strains as they emerge. Here, we characterize antigenic drift events observed in the hemagglutinin (HA) sequence of the pandemic H1N1 lineage from 2015–2019. We observed the substitutions S200P, K147N, and P154S, together with other mutations in structural, functional, and/or epitope regions in 2015–2019 HA protein sequences from the Mountain West region of the United States, the larger United States, Europe, and other Northern Hemisphere countries. We reconstructed multiple phylogenetic trees to track the relationships and spread of these mutations and tested for evidence of selection pressure on HA. We found that the prevalence of amino acid substitutions at positions 147, 154, 159, 200, and 233 significantly changed throughout the studied geographical regions between 2015 and 2019. We also found evidence of coevolution among a subset of these amino acid substitutions. The results from this study could be relevant for future epidemiological tracking and vaccine prediction efforts. Similar analyses in the future could identify additional sequence changes that could affect the pathogenicity and/or infectivity of this virus in its human host
Recommended from our members
Identification of diagnostic peptide regions that distinguish Zika virus from related mosquito-borne Flaviviruses
Zika virus (ZIKV) is a member of the Flavivirus genus of positive-sense single-stranded RNA viruses, which includes Dengue, West Nile, Yellow Fever, and other mosquito-borne arboviruses. Infection by ZIKV can be difficult to distinguish from infection by other mosquito-borne Flaviviruses due to high sequence similarity, serum antibody cross-reactivity, and virus co-circulation in endemic areas. Indeed, existing serological methods are not able to consistently differentiate ZIKV from other Flaviviruses, which makes it extremely difficult to accurately calculate the incidence rate of Zika-associated Guillain-Barre in adults, microcephaly in newborns, or asymptomatic infections within a geographical area. In order to identify Zika-specific peptide regions that could be used as serology reagents, we have applied comparative genomics and protein structure analyses to identify amino acid residues that distinguish each of 10 Flavivirus species and subtypes from each other by calculating the specificity, sensitivity, and surface exposure of each residue in relevant target proteins. For ZIKV we identified 104 and 116 15-mer peptides in the E glycoprotein and NS1 non-structural protein, respectively, that contain multiple diagnostic sites and are located in surface-exposed regions in the tertiary protein structure. These sensitive, specific, and surface-exposed peptide regions should serve as useful reagents for seroprevalence studies to better distinguish between prior infections with any of these mosquito-borne Flaviviruses. The development of better detection methods and diagnostic tools will enable clinicians and public health workers to more accurately estimate the true incidence rate of asymptomatic infections, neurological syndromes, and birth defects associated with ZIKV infection
- …