197 research outputs found

    Bioinformatics tools for analysing viral genomic data

    Get PDF
    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

    BMC Genomics

    Get PDF
    BackgroundDeep sequencing makes it possible to observe low-frequency viral variants and sub-populations with greater accuracy and sensitivity than ever before. Existing platforms can be used to multiplex a large number of samples; however, analysis of the resulting data is complex and involves separating barcoded samples and various read manipulation processes ending in final assembly. Many assembly tools were designed with larger genomes and higher fidelity polymerases in mind and do not perform well with reads derived from highly variable viral genomes. Reference-based assemblers may leave gaps in viral assemblies while de novo assemblers may struggle to assemble unique genomes.ResultsThe IRMA (iterative refinement meta-assembler) pipeline solves the problem of viral variation by the iterative optimization of read gathering and assembly. As with all reference-based assembly, reads are included in assembly when they match consensus template sets; however, IRMA provides for on-the-fly reference editing, correction, and optional elongation without the need for additional reference selection. This increases both read depth and breadth. IRMA also focuses on quality control, error correction, indel reporting, variant calling and variant phasing. In fact, IRMA\ue2\u20ac\u2122s ability to detect and phase minor variants is one of its most distinguishing features. We have built modules for influenza and ebolavirus. We demonstrate usage and provide calibration data from mixture experiments. Methods for variant calling, phasing, and error estimation/correction have been redesigned to meet the needs of viral genomic sequencing.ConclusionIRMA provides a robust next-generation sequencing assembly solution that is adapted to the needs and characteristics of viral genomes. The software solves issues related to the genetic diversity of viruses while providing customized variant calling, phasing, and quality control. IRMA is freely available for non-commercial use on Linux and Mac OS X and has been parallelized for high-throughput computing.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3030-6) contains supplementary material, which is available to authorized users.2016-09-05T00:00:00Z27595578PMC501193

    Exploring the phylodynamics, genetic reassortment and RNA secondary structure formation patterns of orthomyxoviruses by comparative sequence analysis

    Get PDF
    RNA viruses are among the most virulent microorganisms that threaten the health of humans and livestock. Among the most socio-economically important of the known RNA viruses are those found in the family Orthomyxovirus. In this era of rapid low-cost genome sequencing and advancements in computational biology techniques, many previously difficult research questions relating to the molecular epidemiology and evolutionary dynamics of these viruses can now be answered with ease. Using sequence data together with associated meta-data, in chapter two of this dissertation I tested the hypothesis that the Influenza A/H1N1 2009 pandemic virus was introduced multiple times into Africa, and subsequently dispersed heterogeneously across the continent. I further tested to what degree factors such as road distances and air travel distances impacted the observed pattern of spread of this virus in Africa using a generalised linear modelbased approach. The results suggested that their were multiple simultaneous introductions of 2009 pandemic A/H1N1 into Africa, and geographical distance and human mobility through air travel played an important role towards dissemination. In chapter three, I set out to test two hypotheses: (1) that there is no difference in the frequency of reassortments among the segments that constitute influenza virus genomes; and (2) that there is epochal temporal reassortment among influenza viruses and that all geographical regions are equally likely sources of epidemiologically important influenza virus reassortant lineages. The findings suggested that surface segments are more frequently exchanges than internal genes and that North America/Asia, Oceania, and Asia could be the most likely source locations for reassortant Influenza A, B and C virus lineages respectively. In chapter four of this thesis, I explored the formation of RNA secondary structures within the genomes of orthomyxoviruses belonging to five genera: Influenza A, B and C, Infectious Salmon Anaemia Virus and Thogotovirus using in silico RNA folding predictions and additional molecular evolution and phylogenetic tests to show that structured regions may be biologically functional. The presence of some conserved structures across the five genera is likely a reflection of the biological importance of these structures, warranting further investigation regarding their role in the evolution and possible development of antiviral resistance. The studies herein demonstrate that pathogen genomics-based analytical approaches are useful both for understanding the mechanisms that drive the evolution and spread of rapidly evolving viral pathogens such as orthomyxoviruses, and for illuminating how these approaches could be leveraged to improve the management of these pathogens

    Phylodynamic Patterns in Pathogen Ecology and Evolution.

    Full text link
    The rapid evolution of viral pathogens requires us to consider epidemiological, ecological and evolutionary processes as coupled together and occurring at the same timescale. Rotavirus and influenza account for high levels of morbidity and mortality worldwide and are two important examples of such dynamics. In this work, I investigate the different evolutionary and ecological processes that shape the antigenic structure and phylogenetic characteristics of these two viruses. In the first part of my work, I use a theoretical model of influenza A/H3N2 to identify the relative importance of antigenic novelty, competition between lineages, and changes in the susceptibility of the host population to circulating strains in determining the evolutionary and epidemiological trajectory of the virus. I develop this model further to correspond with patterns of immunity and infection observed in rotavirus, and investigate how reassortment, the swapping of gene segments between viruses, influences the formation and replacement of rotavirus genotypes through immune mediated processes. In the second part of my work, I use a tool (SeasMig), which I developed, to infer alternative stochastically generated migration and mutation events along phylogenetic trees in a Bayesian manner. Using SeasMig, I first show how the seasonality of A/H3N2 influenza incidence corresponds to rates of immigration and emigration of the virus. Subsequently, I tease out the different evolutionary and ecological processes, which drive changes in the US rotavirus population following onset of routine vaccination. My work has implications for identifying likely evolutionary mechanisms, which may lead to reduced vaccine efficacy, and for vaccine strain selection.PhDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113494/1/dzinder_1.pd

    Gain-of-Function Experiments With Bacteriophage Lambda Uncover Residues Under Diversifying Selection in Nature

    Get PDF
    Viral gain-of-function mutations frequently evolve during laboratory experiments. Whether the specific mutations that evolve in the lab also evolve in nature and whether they have the same impact on evolution in the real world is unknown. We studied a model virus, bacteriophage λ, that repeatedly evolves to exploit a new host receptor under typical laboratory conditions. Here, we demonstrate that two residues of λ’s J protein are required for the new function. In natural λ variants, these amino acid sites are highly diverse and evolve at high rates. Insertions and deletions at these locations are associated with phylogenetic patterns indicative of ecological diversification. Our results show that viral evolution in the laboratory mirrors that in nature and that laboratory experiments can be coupled with protein sequence analyses to identify the causes of viral evolution in the real world. Furthermore, our results provide evidence for widespread host-shift evolution in lambdoid viruses

    Gain-of-Function Experiments With Bacteriophage Lambda Uncover Residues Under Diversifying Selection in Nature

    Get PDF
    Viral gain-of-function mutations frequently evolve during laboratory experiments. Whether the specific mutations that evolve in the lab also evolve in nature and whether they have the same impact on evolution in the real world is unknown. We studied a model virus, bacteriophage λ, that repeatedly evolves to exploit a new host receptor under typical laboratory conditions. Here, we demonstrate that two residues of λ’s J protein are required for the new function. In natural λ variants, these amino acid sites are highly diverse and evolve at high rates. Insertions and deletions at these locations are associated with phylogenetic patterns indicative of ecological diversification. Our results show that viral evolution in the laboratory mirrors that in nature and that laboratory experiments can be coupled with protein sequence analyses to identify the causes of viral evolution in the real world. Furthermore, our results provide evidence for widespread host-shift evolution in lambdoid viruses

    Computational Methods for Assessment and Prediction of Viral Evolutionary and Epidemiological Dynamics

    Get PDF
    The ability to comprehend the dynamics of viruses’ transmission and their evolution, even to a limited extent, can significantly enhance our capacity to predict and control the spread of infectious diseases. An example of such significance is COVID-19 caused by the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2). In this dissertation, I am proposing computational models that present more precise and comprehensive approaches in viral outbreak investigations and epidemiology, providing invaluable insights into the transmission dynamics, and potential inter- ventions of infectious diseases by facilitating the timely detection of viral variants. The first model is a mathematical framework based on population dynamics for the calculation of a numerical measure of the fitness of SARS-CoV-2 subtypes. The second model I propose here is a transmissibility estimation method based on a Bayesian approach to calculate the most likely fitness landscape for SARS-CoV-2 using a generalized logistic sub-epidemic model. Using the proposed model I estimate the epistatic interaction networks of spike protein in SARS-CoV-2. Based on the community structure of these epistatic networks, I propose a computational framework that predicts emerging haplotypes of SARS-CoV-2 with altered transmissibility. The last method proposed in this dissertation is a maximum likelihood framework that integrates phylogenetic and random graph models to accurately infer transmission networks without requiring case-specific data

    Molecular epidemiology of acute respiratory virus infections

    Get PDF
    Acute respiratory virus infections are very common but can also cause severe disease. In my thesis, I have analysed the molecular epidemiology of acute respiratory virus infections caused by enterovirus D68 and coronaviruses. In Paper I, we used real-time PCR and Sanger sequencing to analyse the outbreak of enterovirus D68 in Stockholm in 2016. We found that the outbreak was caused by the subclade B3, and we also described three patients with neurological manifestations. The virus sequences were closely related to concurrent sequences from North America. In Paper II, we developed an assay for whole-genome sequencing of enterovirus D68 a next-generation platform. By using the assay on the samples from the 2016 outbreak, we found that the outbreak was caused by multiple independent introductions of the virus. We also estimated the time to the most common recent ancestor for the subclades B1 and B3 to 2009. In Paper III, we used the whole-genome sequencing assay in a European multicentre study of enterovirus D68 circulation in the 2018 season. We also included sequences in public repositories. We found that the viruses in 2018 belonged to subclades A2 and B3 and that sequences in subclade B3 originated from the circulation in 2016. We also found that enterovirus D68 had a rapid geographic mixing and that residues on the surface of the virus particle had an elevated substitution rate of amino acids. Hence, we proposed asymptomatic reinfections of adults to explain both rapid geographical dispersal and selective pressure on the surface residues. In Paper IV, we analysed stored results from routine clinical diagnostics for the four common cold coronaviruses. The data contained the results from September 2009 to April 2020. At the species level, we found a pattern of alternating biennial circulation, and we also found the circulation of Betacoronaviruses to peak earlier than that of Alphacoronaviruses. In Paper V, we investigated Sweden’s first SARS-CoV-2 pandemic wave in 2020. We analysed stored respiratory samples with real-time PCR for SARS-CoV-2 and found that community transmissions started earlier than previously appreciated. We also se-quenced stored SARS-CoV-2-positive samples. To these sequences, we added infor-mation from contact tracing records and combined them with data from public reposi-tories. Among cases exposed abroad, we mainly found clades 20B and 20A, whereas clade 20C dominated domestic infections. Furthermore, we found the proportion of clade 20C to be correlated with the cumulative number of deaths due to COVID-19. We interpreted this as early undetected introductions of clade 20C having had a significant impact on the further course of the pandemic in Sweden
    • …