10 research outputs found

    From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA)

    Get PDF
    Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA

    Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

    Get PDF
    SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories.Peer Reviewe

    A Fréchet tree distance measure to compare phylogeographic spread paths across trees

    Get PDF
    Abstract Phylogeographic methods reconstruct the origin and spread of taxa by inferring locations for internal nodes of the phylogenetic tree from sampling locations of genetic sequences. This is commonly applied to study pathogen outbreaks and spread. To evaluate such reconstructions, the inferred spread paths from root to leaf nodes should be compared to other methods or references. Usually, ancestral state reconstructions are evaluated by node-wise comparisons, therefore requiring the same tree topology, which is usually unknown. Here, we present a method for comparing phylogeographies across different trees inferred from the same taxa. We compare paths of locations by calculating discrete Fréchet distances. By correcting the distances by the number of paths going through a node, we define the Fréchet tree distance as a distance measure between phylogeographies. As an application, we compare phylogeographic spread patterns on trees inferred with different methods from hemagglutinin sequences of H5N1 influenza viruses, finding that both tree inference and ancestral reconstruction cause variation in phylogeographic spread that is not directly reflected by topological differences. The method is suitable for comparing phylogeographies inferred with different tree or phylogeographic inference methods to each other or to a known ground truth, thus enabling a quality assessment of such techniques

    hzi-bifo/Phylogeography_Paper

    No full text
    This repository contains all data and code for the manuscript "Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic"

    Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic.

    No full text
    Influenza A viruses cause seasonal epidemics and occasional pandemics in the human population. While the worldwide circulation of seasonal influenza is at least partly understood, the exact migration patterns between countries, states or cities are not well studied. Here, we use the Sankoff algorithm for parsimonious phylogeographic reconstruction together with effective distances based on a worldwide air transportation network. By first simulating geographic spread and then phylogenetic trees and genetic sequences, we confirmed that reconstructions with effective distances inferred phylogeographic spread more accurately than reconstructions with geographic distances and Bayesian reconstructions with BEAST that do not use any distance information, and led to comparable results to the Bayesian reconstruction using distance information via a generalized linear model. Our method extends Bayesian methods that estimate rates from the data by using fine-grained locations like airports and inferring intermediate locations not observed among sampled isolates. When applied to sequence data of the pandemic H1N1 influenza A virus in 2009, our approach correctly inferred the origin and proposed airports mainly involved in the spread of the virus. In case of a novel outbreak, this approach allows to rapidly analyze sequence data and infer origin and spread routes to improve disease surveillance and control

    hzi-bifo/SDplots: First release of SD plots data

    No full text
    This repository contains all published data

    In Silico Vaccine Strain Prediction for Human Influenza Viruses.

    Get PDF
    Vaccines preventing seasonal influenza infections save many lives every year; however, due to rapid viral evolution, they have to be updated frequently to remain effective. To identify appropriate vaccine strains, the World Health Organization (WHO) operates a global program that continually generates and interprets surveillance data. Over the past decade, sophisticated computational techniques, drawing from multiple theoretical disciplines, have been developed that predict viral lineages rising to predominance, assess their suitability as vaccine strains, link genetic to antigenic alterations, as well as integrate and visualize genetic, epidemiological, structural, and antigenic data. These could form the basis of an objective and reproducible vaccine strain-selection procedure utilizing the complex, large-scale data types from surveillance. To this end, computational techniques should already be incorporated into the vaccine-selection process in an independent, parallel track, and their performance continuously evaluated

    Sweep Dynamics (SD) plots: Computational identification of selective sweeps to monitor the adaptation of influenza A viruses

    Get PDF
    Abstract Monitoring changes in influenza A virus genomes is crucial to understand its rapid evolution and adaptation to changing conditions e.g. establishment within novel host species. Selective sweeps represent a rapid mode of adaptation and are typically observed in human influenza A viruses. We describe Sweep Dynamics (SD) plots, a computational method combining phylogenetic algorithms with statistical techniques to characterize the molecular adaptation of rapidly evolving viruses from longitudinal sequence data. SD plots facilitate the identification of selective sweeps, the time periods in which these occurred and associated changes providing a selective advantage to the virus. We studied the past genome-wide adaptation of the 2009 pandemic H1N1 influenza A (pH1N1) and seasonal H3N2 influenza A (sH3N2) viruses. The pH1N1 influenza virus showed simultaneous amino acid changes in various proteins, particularly in seasons of high pH1N1 activity. Partially, these changes resulted in functional alterations facilitating sustained human-to-human transmission. In the evolution of sH3N2 influenza viruses, we detected changes characterizing vaccine strains, which were occasionally revealed in selective sweeps one season prior to the WHO recommendation. Taken together, SD plots allow monitoring and characterizing the adaptive evolution of influenza A viruses by identifying selective sweeps and their associated signatures

    Cellular Importin-α3 Expression Dynamics in the Lung Regulate Antiviral Response Pathways against Influenza A Virus Infection.

    Get PDF
    Importin-α adaptor proteins orchestrate dynamic nuclear transport processes involved in cellular homeostasis. Here, we show that importin-α3, one of the main NF-κB transporters, is the most abundantly expressed classical nuclear transport factor in the mammalian respiratory tract. Importin-α3 promoter activity is regulated by TNF-α-induced NF-κB in a concentration-dependent manner. High-level TNF-α-inducing highly pathogenic avian influenza A viruses (HPAIVs) isolated from fatal human cases harboring human-type polymerase signatures (PB2 627K, 701N) significantly downregulate importin-α3 mRNA expression in primary lung cells. Importin-α3 depletion is restored upon back-mutating the HPAIV polymerase into an avian-type signature (PB2 627E, 701D) that can no longer induce high TNF-α levels. Importin-α3-deficient mice show reduced NF-κB-activated antiviral gene expression and increased influenza lethality. Thus, importin-α3 plays a key role in antiviral immunity against influenza. Lifting the bottleneck in importin-α3 availability in the lung might provide a new strategy to combat respiratory virus infections
    corecore