56 research outputs found

    Large-scale invasion of unicellular eukaryotic genomes by integrating DNA viruses

    Get PDF
    Eukaryotic genomes contain a variety of endogenous viral elements (EVEs), which are mostly derived from RNA and ssDNA viruses that are no longer functional and are considered to be “genomic fossils.” Genomic surveys of EVEs, however, are strongly biased toward animals and plants, whereas protists, which represent the majority of eukaryotic diversity, remain poorly represented. Here, we show that protist genomes harbor tens to thousands of diverse, ~14 to 40 kbp long dsDNA viruses. These EVEs, composed of virophages, Polinton-like viruses, and related entities, have remained hitherto hidden owing to poor sequence conservation between virus groups and their repetitive nature that precluded accurate short-read assembly. We show that long-read sequencing technology is ideal for resolving virus insertions. Many protist EVEs appear intact, and most encode integrases, which suggests that they have actively colonized hosts across the tree of eukaryotes. We also found evidence for gene expression in host transcriptomes and that closely related virophage and Polinton-like virus genomes are abundant in viral metagenomes, indicating that many EVEs are probably functional viruses

    Targeted long-read sequencing of a locus under long-term balancing selection in Capsella

    Get PDF
    YesRapid advances in short-read DNA sequencing technologies have revolutionized population genomic studies, but there are genomic regions where this technology reaches its limits. Limitations mostly arise due to the difficulties in assembly or alignment to genomic regions of high sequence divergence and high repeat content, which are typical characteristics for loci under strong long-term balancing selection. Studying genetic diversity at such loci therefore remains challenging. Here, we investigate the feasibility and error rates associated with targeted long-read sequencing of a locus under balancing selection. For this purpose, we generated bacterial artificial chromosomes (BACs) containing the Brassicaceae S-locus, a region under strong negative frequency-dependent selection which has previously proven difficult to assemble in its entirety using short reads. We sequence S-locus BACs with single-molecule long-read sequencing technology and conduct de novo assembly of these S-locus haplotypes. By comparing repeated assemblies resulting from independent long-read sequencing runs on the same BAC clone we do not detect any structural errors, suggesting that reliable assemblies are generated, but we estimate an indel error rate of 5.7×10−5. A similar error rate was estimated based on comparison of Illumina short-read sequences and BAC assemblies. Our results show that, until de novo assembly of multiple individuals using long-read sequencing becomes feasible, targeted long-read sequencing of loci under balancing selection is a viable option with low error rates for single nucleotide polymorphisms or structural variation. We further find that short-read sequencing is a valuable complement, allowing correction of the relatively high rate of indel errors that result from this approach.This study was supported by a grant from the Swedish Research Council to T.S

    The Development and Application of Computational Methods for Genome Annotation

    Get PDF
    Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however, many genomes, especially eukaryotic genomes have not yet been annotated. Ab-initio gene prediction is notoriously hard in eukaryotic genomes due to the sparse gene content and introns interrupting genes. Two more-promising strategies for annotating eukaryotic genomes are RNA-sequencing followed by transcriptome assembly and/or mapping genes from a closely related species. Current transcriptome assembly methods can assemble either short or long RNA-sequencing reads, which each have their own weaknesses that limit assembly accuracy. Additionally, there are no standalone tools that can accurately map gene annotations from one assembly to another. Therefore, in this work we first present hybrid-read transcriptome assembly with StringTie where we combine long and short reads to mitigate the weaknesses of each datatype. We show that hybrid-read assembly achieves better accuracy than long or short-read only assembly on simulated as well as real RNA-sequencing data from human, Mus musculus, and Arabidopsis thaliana. We then introduce Liftoff, which is a standalone tool that can map gene annotations between assemblies of the same or closely related species. As a proof of concept, we map genes between two versions of the human reference genome and then between the human reference genome and the chimpanzee reference genome. We then describe the results of using Liftoff to annotate 3 new reference-quality human genome assemblies and a new assembly of the bread wheat genome. Lastly, we introduce LiftoffTools, which is a toolkit that compares the sequence, synteny, and copy number of genes lifted from one assembly to another

    Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi

    Get PDF
    Background: Borrelia (B.) burgdorferi sensu lato, including the tick-transmitted agents of human Lyme borreliosis, have particularly complex genomes, consisting of a linear main chromosome and numerous linear and circular plasmids. The number and structure of plasmids is variable even in strains within a single genospecies. Genes on these plasmids are known to play essential roles in virulence and pathogenicity as well as host and vector associations. For this reason, it is essential to explore methods for rapid and reliable characterisation of molecular level changes on plasmids. In this study we used three strains: a low passage isolate of B. burgdorferi sensu stricto strain B31(-NRZ) and two closely related strains (PAli and PAbe) that were isolated from human patients. Sequences of these strains were compared to the previously sequenced reference strain B31 (available in GenBank) to obtain proof-of-principle information on the suitability of next generation sequencing (NGS) library construction and sequencing methods on the assembly of bacterial plasmids. We tested the effectiveness of different short read assemblers on Illumina sequences, and of long read generation methods on sequence data from Pacific Bioscience single-molecule real-time (SMRT) and nanopore (Oxford Nanopore Technologies) sequencing technology. Results: Inclusion of mate pair library reads improved the assembly in some plasmids as did prior enrichment of plasmids. While cp32 plasmids remained refractory to assembly using only short reads they were effectively assembled by long read sequencing methods. The long read SMRT and nanopore sequences came, however, at the cost of indels (insertions or deletions) appearing in an unpredictable manner. Using long and short read technologies together allowed us to show that the three B. burgdorferi s.s. strains investigated here, whilst having similar plasmid structures to each other (apart from fusion of cp32 plasmids), differed significantly from the reference strain B31-GB, especially in the case of cp32 plasmids. Conclusion: Short read methods are sufficient to assemble the main chromosome and many of the plasmids in B. burgdorferi. However, a combination of short and long read sequencing methods is essential for proper assembly of all plasmids including cp32 and thus, for gaining an understanding of host- or vector adaptations. An important conclusion from our work is that the evolution of Borrelia plasmids appears to be dynamic. This has important implications for the development of useful research strategies to monitor the risk of Lyme disease occurrence and how to medically manage it

    Engineering a feedback-based synthetic gene circuit for targeted continuous evolution of a gene in E. coli

    Get PDF
    Directed evolution is an invaluable technique for engineering proteins to possess desired physical and chemical properties when very little structural and functional information is known. It is divided into two sequential steps: generating a library of protein variants using mutagenic techniques; and applying a screening or selection strategy to scan the library for variants displaying desired properties. Library generation is performed using either in vitro or in vivo techniques, while screening or selection typically occurs in a suitable host cell. Currently, in vitro methods like error-prone PCR are popular for library generation. However, these techniques can be labour intensive, prone to mutation biases, and generate limited library sizes for screening. In vivo mutagenic techniques overcome these limitations by enabling simultaneous library generation and selection within cells. By generating random mutations in the gene-of-interest within one cell cycle, each cell in a batch culture potentially represents a library variant. Such a continuous evolution system can run for weeks with minimal human intervention, greatly expanding the genetic search space for protein engineering. The challenge lies in developing a mutator system that specifically generates mutations in the target gene, while maintaining the cell’s genomic fidelity. With this goal in mind, a mutator system was engineered in E. coli that introduces targeted cytidine deamination damage and subsequently performs error-prone DNA repair by hijacking the base excision repair pathway. The targeted damage occurs via activation induced cytidine deaminase fused to T7 RNA polymerase, while the error-prone DNA repair is performed by a three-protein fusion comprising a 5’-3’-exonuclease, an AP-endonuclease and an error-prone DNA polymerase. The mutagenic characteristics of this system was tested by knocking out GFP expression and analysing the mutant library using next generation sequencing techniques. The system was also experimentally shown to generate functionally active mutations that reverted inactivated β-lactamase gene variants to confer ampicillin resistance.Open Acces

    Transcriptional Regulation and Epigenetic Mechanisms Underlying Host-Parasite Interactions in Human Malaria

    Get PDF
    Human malaria is one of the most important infectious diseases and a major cause of death and poverty worldwide. It is caused by protozoan parasites of the genus Plasmodium that are transmitted by the bites of mosquitoes of the genus Anopheles. The parasites Plasmodium falciparum and the mosquitoes Anopheles gambiae are the leading figures of this global burden, which disproportionally affects Africa and children under the age of five. To fulfill development and to achieve adaptation to changing environments in the human and mosquito hosts, Plasmodium parasites are capable of drastic transcriptional switches. The Anopheles mosquitoes are the main vectors for human malaria, and they can display phenotypic variability in life history traits, including vector competence or responses against Plasmodium. Yet, the transcriptional regulation underlying host parasite interactions in human malaria, particularly based on epigenetic mechanisms and regarding the life cycle in the mosquito, remain almost completely unknown. In this doctoral thesis, we have applied multi-omic approaches and bioinformatic analyses to investigate the regulatory genome of both P. falciparum and A. gambiae mosquitoes, associated with the Plasmodium development and interactions within hosts, and with the responses of Anopheles mosquitoes to the parasitic infection. We have integrated genomic, epigenomic and transcriptomic approach to unveil relevant cis-regulatory elements and to assay the relationship between gene expression levels and chromatin-related mechanisms, such as histone marks or chromatin accessibility levels. We applied different techniques to these organisms, integrating RNA-seq, ChIP-seq and ATAC-seq for the first time. We reported the positive correlation between transcription and chromatin accessibility by ATAC-seq or active histone marks by ChIP-seq. We also identified thousands of active regulatory sequences, including enhancer candidates, that appeared to be linked to Plasmodium developmental transitions or clonally variant gene expression within humans, or that in the case of mosquitoes seemed to be specific to tissues or Plasmodium infection status. Ultimately, these allowed us to predict cognate transcription factors. Altogether, we provide evidence for genome-wide mechanisms and regulatory regions that may be involved in the dynamic transcriptional regulation underlying host-parasite interactions between malaria parasites and the human and mosquito hosts. This is much required in the context of current efforts against malaria, to inform existing and new mosquito-control and anti-malaria strategies

    Annotation of marine eukaryotic genomes

    Get PDF

    Bioinformatics and Next Generation Sequencing: Applications of Arthropod Genomes

    Get PDF
    Over the past decade, the Next Generation Sequencing (NGS) technology has been broadly applied in many areas such as genomics, medical diagnosis, biotechnology, virology, biological systematics, forensic biology, and anthropology. Taken together, it has offered us brilliant insights into life sciences. Most of the work presented in this thesis describes NGS applications on genome assembly, genome annotation, and comparative genomics, using arthropods as case studies: (1) by sequencing and analyzing the genomes of three Tetranychus spider mites with three completely different feeding behaviors, we uncovered genomic signature variations and indicative of pest adaptations; (2) we sequenced, assembled and annotated five Brevipalpus flat mite genomes and their corresponding endosymbiont Cardinium genomes. Comparative genomics reveals herbivorous pest adaptations and parthenogenesis; (3) the complete genomic analysis of parasitoid wasp Copidosoma floridanum indicates the mechanism of polyembryony of such primary parasite of moths. By bioinformatics and genomics approaches, my study provides the genomic basis and establishes the hypotheses for the future biology in pest and arthropod researches. These NGS applications of arthropod genomes will offer new insights into arthropod evolution and plant-herbivore interactions, open unique opportunities to develop novel plant protection strategies, and additionally, provide arthropod genomic resources as well

    A population phylogenomic analysis of the origin and spread of Escherichia coli sequence type 131 (ST131)

    Get PDF
    The incidence of infections caused by extraintestinal Escherichia coli (ExPEC) is rising globally due to their increasing resistance to standard antibiotics. This results in the use of broader-spectrum drugs, prolonged patient ill-health and more nosocomial infections. E. coli sequence type 131 (ST131) is the predominant ExPEC clone worldwide. The antimicrobial resistance (AMR) gene repertoire of ST131 is evolving rapidly due to the widespread use of β-lactam (bla) antibiotics. Here, we performed a genomic investigation of an ST131 outbreak in a long-term care facility (LTCF) to describe transmission, within-host clonal diversity, genetic diversity of antibiotic resistance and the evolution of ST131 in the LTCF over a seven-year period. We analyzed the population structure and inferred the genealogical history of the LTCF isolates in the context of local hospital and global collections of ST131 to elucidate the epidemiology of ST131. We confirmed our initial hypotheses by reconstructing the evolutionary history of a much larger population consisting of >4000 global ST131 genomes This provided a deeper resolution of their evolutionary trajectories and the adaptive mechanisms of AMR driven by their ESBL genes, particularly cefotaximase (blaCTX). We further investigated the intersection of the AMR genes (AMRGs) found in ST131 with that of the human microbiome to understand the extent of their loss, gain and spread across different bacterial species. Across all strains, a large number of ST131’s AMRGs were found in a total of 794 genes in the human microbiome. Various gene families were represented, including transporters, transcription factors, β-lactamases and cell wall biosynthesis enzymes. To establish the main culprit for the dynamic nature of the blaCTX-M genes, we performed long read sequencing using a GridION X5 instrument. Analysis of long read-only assemblies revealed a clear and robust result on the genetic flanking context of blaCTX-M genes in both plasmid and chromosomes. Overall, our findings underpin the tremendous potential power for improving our current treatment of bacterial infections using high-throughput analysis of whole genome sequence data
    corecore