468 research outputs found

    Metassembler: merging and optimizing de novo genome assemblies

    Get PDF
    Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net

    Reference genome and comparative genome analysis for the WHO reference strain for Mycobacterium bovis BCG Danish, the present tuberculosis vaccine

    Get PDF
    Background: Mycobacterium bovis bacillus Calmette-Guerin (M. bovis BCG) is the only vaccine available against tuberculosis (TB). In an effort to standardize the vaccine production, three substrains, i.e. BCG Danish 1331, Tokyo 172-1 and Russia BCG-1 were established as the WHO reference strains. Both for BCG Tokyo 172-1 as Russia BCG-1, reference genomes exist, not for BCG Danish. In this study, we set out to determine the completely assembled genome sequence for BCG Danish and to establish a workflow for genome characterization of engineering-derived vaccine candidate strains.ResultsBy combining second (Illumina) and third (PacBio) generation sequencing in an integrated genome analysis workflow for BCG, we could construct the completely assembled genome sequence of BCG Danish 1331 (07/270) (and an engineered derivative that is studied as an improved vaccine candidate, a SapM KO), including the resolution of the analytically challenging long duplication regions. We report the presence of a DU1-like duplication in BCG Danish 1331, while this tandem duplication was previously thought to be exclusively restricted to BCG Pasteur. Furthermore, comparative genome analyses of publicly available data for BCG substrains showed the absence of a DU1 in certain BCG Pasteur substrains and the presence of a DU1-like duplication in some BCG China substrains. By integrating publicly available data, we provide an update to the genome features of the commonly used BCG strains. Conclusions: We demonstrate how this analysis workflow enables the resolution of genome duplications and of the genome of engineered derivatives of the BCG Danish vaccine strain. The BCG Danish WHO reference genome will serve as a reference for future engineered strains and the established workflow can be used to enhance BCG vaccine standardization

    A chromosome-level genome resource for studying virulence mechanisms and evolution of the coffee rust pathogen Hemileia vastatrix

    Get PDF
    Recurrent epidemics of coffee leaf rust, caused by the fungal pathogen Hemileia vastatrix, have constrained the sustainable production of Arabica coffee for over 150 years. The ability of H. vastatrix to overcome resistance in coffee cultivars and evolve new races is inexplicable for a pathogen that supposedly only utilizes clonal reproduction. Understanding the evolutionary complexity between H. vastatrix and its only known host, including determining how the pathogen evolves virulence so rapidly is crucial for disease management. Achieving such goals relies on the availability of a comprehensive and high-quality genome reference assembly. To date, two reference genomes have been assembled and published for H. vastatrix that, while useful, remain fragmented and do not represent chromosomal scaffolds. Here, we present a complete scaffolded pseudochromosome-level genome resource for H. vastatrix strain 178a (Hv178a). Our initial assembly revealed an unusually high degree of gene duplication (over 50% BUSCO basidiomycota_odb10 genes). Upon inspection, this was predominantly due to a single scaffold that itself showed 91.9% BUSCO Completeness. Taxonomic analysis of predicted BUSCO genes placed this scaffold in Exobasidiomycetes and suggests it is a distinct genome, which we have named Hv178a associated fungal genome (Hv178a AFG). The high depth of coverage and close association with Hv178a raises the prospect of symbiosis, although we cannot completely rule out contamination at this time. The main Ca. 546 Mbp Hv178a genome was primarily (97.7%) localised to 11 pseudochromosomes (51.5 Mb N50), building the foundation for future advanced studies of genome structure and organization.info:eu-repo/semantics/publishedVersio

    GAM-NGS: genomic assemblies merger for next generation sequencing

    Get PDF
    Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions.Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools.Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct

    PlantFuncSSR: Integrating first and next generation transcriptomics for mining of SSR-functional domains markers

    Get PDF
    © 2016 Sablok, Pérez-Pulido, Do, Seong, Casimiro-Soriguer, La Porta, Ralph, Squartini, Muñoz-Merida and Harikrishna. Analysis of repetitive DNA sequence content and divergence among the repetitive functional classes is a well-accepted approach for estimation of inter- and intrageneric differences in plant genomes. Among these elements, microsatellites, or Simple Sequence Repeats (SSRs), have been widely demonstrated as powerful genetic markers for species and varieties discrimination. We present PlantFuncSSRs platform having more than 364 plant species with more than 2 million functional SSRs. They are provided with detailed annotations for easy functional browsing of SSRs and with information on primer pairs and associated functional domains. PlantFuncSSRs can be leveraged to identify functional-based genic variability among the species of interest, which might be of particular interest in developing functional markers in plants. This comprehensive on-line portal unifies mining of SSRs from first and next generation sequencing datasets, corresponding primer pairs and associated in-depth functional annotation such as gene ontology annotation, gene interactions and its identification from reference protein databases. PlantFuncSSRs is freely accessible at: http://www. bioinfocabd.upo.es/plantssr

    Regulation of desiccation tolerance in Xerophyta seedlings and leaves

    Get PDF
    A small, diverse group of angiosperms known as resurrection plants display vegetative desiccation tolerance and can survive loss of up to 95% of cellular water, a feat only seen in the seeds and pollen of other angiosperms. Xerophyta humilis is a resurrection plant native to Southern Africa that has been the target of previous transcriptomic and proteomic studies into the mechanisms of plant desiccation tolerance. The aim of this study was to investigate the hypothesis that vegetative desiccation tolerance is derived from the networks that control desiccation tolerance in seeds and germinating seedlings in angiosperms, particularly the epigenetically silenced seed maturation genes. Germinating seedlings of X. humilis and the related resurrection plant X. viscosa were found to be VDT from the earliest stages of germination, and exhibited the characteristic vegetative trait of poikilochlorophylly as seen in mature leaves. The X. humilis desiccation transcriptome comprising 76,768 distinct gene clusters was successfully assembled from sequencing samples at five relative water contents (100%, 80%, 60%, 40% and 5%) to identify the networks activated in response to water loss. Desiccation was associated with successive waves of transcription factor induction, as well as widespread down-regulation of histone modification enzymes. Many seed-specific genes, such as late embryogenesis abundant (LEA) proteins, seed storage proteins and oleosins, were induced in vegetative tissue. LEA transcripts in particular were highly up-regulated during desiccation, and the large number of distinct LEA transcripts (over 150) suggests possible LEA gene expansion in Xerophyta compared to desiccation-sensitive plants. Components of the PYL/SnRK2/ABF ABA-signalling pathway were also induced, although the ABF transcription factors activated in response to desiccation were most similar to those induced by drought in A. thaliana rather than seed maturation. Of the canonical seed master regulators (such as the LEC1/ABI3/FUS3/LEC2 network and ABI5) only three ABI3 transcripts were expressed, all of which encoded proteins lacking the seed motif-binding B3-domain. The results of this study suggest that vegetative desiccation tolerance in X. humilis is not associated with re-activation of seed master regulators in vegetative tissue, but may instead involve activation of seed genes by vegetative drought response regulators

    Multi-Species Transcriptome Assemblies of Cultivated and Wild Lentils (Lens sp.) Provide a First Glimpse at the Lentil Pangenome

    Get PDF
    [EN] Lentils (Lens sp.) are one of the main sources of protein for humans in many regions, in part because their rusticity allows them to withstand semi-dry climates and tolerate a wide spectrum of pests. Both are also highly sought-after attributes to face climate change. Wild accessions, rather than cultivated varieties, are typically the holders of most influential alleles for rusticity traits. However, most genomic and transcriptomic research conducted in lentils has been carried out on commercial accessions (L. culinaris), while wild relatives have been largely neglected. Herein, we assembled, annotated, and evaluated the transcriptomes of eight lentil accessions, including the cultivated Lens culinaris and the wild relatives: L. orientalis, L. tomentosus, L. ervoides, L. lamottei, L. nigricans, and two L. odemensis. The assemblies allowed, for the first time, a comparison among different lentil taxa at the coding sequence level, providing further insights into the evolutionary relationships between cultivated and wild germplasm and suggesting a grouping of the seven accessions into at least three conceivable gene pools. Moreover, orthologous clustering allowed a first estimation of the lentil pan-transcriptome. It is composed of 15,910 core genes, encoded in all accessions, and 24,226 accessory genes. The different pan-transcriptome clusters were also screened for Pfam-domain enrichment. The present study has a high novelty, as it is the first pan-transcriptome analysis using six wild species in addition to cultivated species. Because of the amount of transcript sequences provided, our findings will greatly boost lentil research and assist breeding efforts.S
    • …
    corecore