468 research outputs found
Metassembler: merging and optimizing de novo genome assemblies
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net
Reference genome and comparative genome analysis for the WHO reference strain for Mycobacterium bovis BCG Danish, the present tuberculosis vaccine
Background: Mycobacterium bovis bacillus Calmette-Guerin (M. bovis BCG) is the only vaccine available against tuberculosis (TB). In an effort to standardize the vaccine production, three substrains, i.e. BCG Danish 1331, Tokyo 172-1 and Russia BCG-1 were established as the WHO reference strains. Both for BCG Tokyo 172-1 as Russia BCG-1, reference genomes exist, not for BCG Danish. In this study, we set out to determine the completely assembled genome sequence for BCG Danish and to establish a workflow for genome characterization of engineering-derived vaccine candidate strains.ResultsBy combining second (Illumina) and third (PacBio) generation sequencing in an integrated genome analysis workflow for BCG, we could construct the completely assembled genome sequence of BCG Danish 1331 (07/270) (and an engineered derivative that is studied as an improved vaccine candidate, a SapM KO), including the resolution of the analytically challenging long duplication regions. We report the presence of a DU1-like duplication in BCG Danish 1331, while this tandem duplication was previously thought to be exclusively restricted to BCG Pasteur. Furthermore, comparative genome analyses of publicly available data for BCG substrains showed the absence of a DU1 in certain BCG Pasteur substrains and the presence of a DU1-like duplication in some BCG China substrains. By integrating publicly available data, we provide an update to the genome features of the commonly used BCG strains.
Conclusions: We demonstrate how this analysis workflow enables the resolution of genome duplications and of the genome of engineered derivatives of the BCG Danish vaccine strain. The BCG Danish WHO reference genome will serve as a reference for future engineered strains and the established workflow can be used to enhance BCG vaccine standardization
A chromosome-level genome resource for studying virulence mechanisms and evolution of the coffee rust pathogen Hemileia vastatrix
Recurrent epidemics of coffee leaf rust, caused by the fungal pathogen Hemileia vastatrix, have
constrained the sustainable production of Arabica coffee for over 150 years. The ability of H.
vastatrix to overcome resistance in coffee cultivars and evolve new races is inexplicable for a
pathogen that supposedly only utilizes clonal reproduction. Understanding the evolutionary
complexity between H. vastatrix and its only known host, including determining how the
pathogen evolves virulence so rapidly is crucial for disease management. Achieving such goals
relies on the availability of a comprehensive and high-quality genome reference assembly. To
date, two reference genomes have been assembled and published for H. vastatrix that, while
useful, remain fragmented and do not represent chromosomal scaffolds. Here, we present a
complete scaffolded pseudochromosome-level genome resource for H. vastatrix strain 178a
(Hv178a). Our initial assembly revealed an unusually high degree of gene duplication (over
50% BUSCO basidiomycota_odb10 genes). Upon inspection, this was predominantly due to a
single scaffold that itself showed 91.9% BUSCO Completeness. Taxonomic analysis of
predicted BUSCO genes placed this scaffold in Exobasidiomycetes and suggests it is a distinct
genome, which we have named Hv178a associated fungal genome (Hv178a AFG). The high
depth of coverage and close association with Hv178a raises the prospect of symbiosis, although
we cannot completely rule out contamination at this time. The main Ca. 546 Mbp Hv178a
genome was primarily (97.7%) localised to 11 pseudochromosomes (51.5 Mb N50), building
the foundation for future advanced studies of genome structure and organization.info:eu-repo/semantics/publishedVersio
GAM-NGS: genomic assemblies merger for next generation sequencing
Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions.Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools.Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct
PlantFuncSSR: Integrating first and next generation transcriptomics for mining of SSR-functional domains markers
© 2016 Sablok, Pérez-Pulido, Do, Seong, Casimiro-Soriguer, La Porta, Ralph, Squartini, Muñoz-Merida and Harikrishna. Analysis of repetitive DNA sequence content and divergence among the repetitive functional classes is a well-accepted approach for estimation of inter- and intrageneric differences in plant genomes. Among these elements, microsatellites, or Simple Sequence Repeats (SSRs), have been widely demonstrated as powerful genetic markers for species and varieties discrimination. We present PlantFuncSSRs platform having more than 364 plant species with more than 2 million functional SSRs. They are provided with detailed annotations for easy functional browsing of SSRs and with information on primer pairs and associated functional domains. PlantFuncSSRs can be leveraged to identify functional-based genic variability among the species of interest, which might be of particular interest in developing functional markers in plants. This comprehensive on-line portal unifies mining of SSRs from first and next generation sequencing datasets, corresponding primer pairs and associated in-depth functional annotation such as gene ontology annotation, gene interactions and its identification from reference protein databases. PlantFuncSSRs is freely accessible at: http://www. bioinfocabd.upo.es/plantssr
Regulation of desiccation tolerance in Xerophyta seedlings and leaves
A small, diverse group of angiosperms known as resurrection plants display vegetative desiccation tolerance and can survive loss of up to 95% of cellular water, a feat only seen in the seeds and pollen of other angiosperms. Xerophyta humilis is a resurrection plant native to Southern Africa that has been the target of previous transcriptomic and proteomic studies into the mechanisms of plant desiccation tolerance. The aim of this study was to investigate the hypothesis that vegetative desiccation tolerance is derived from the networks that control desiccation tolerance in seeds and germinating seedlings in angiosperms, particularly the epigenetically silenced seed maturation genes. Germinating seedlings of X. humilis and the related resurrection plant X. viscosa were found to be VDT from the earliest stages of germination, and exhibited the characteristic vegetative trait of poikilochlorophylly as seen in mature leaves. The X. humilis desiccation transcriptome comprising 76,768 distinct gene clusters was successfully assembled from sequencing samples at five relative water contents (100%, 80%, 60%, 40% and 5%) to identify the networks activated in response to water loss. Desiccation was associated with successive waves of transcription factor induction, as well as widespread down-regulation of histone modification enzymes. Many seed-specific genes, such as late embryogenesis abundant (LEA) proteins, seed storage proteins and oleosins, were induced in vegetative tissue. LEA transcripts in particular were highly up-regulated during desiccation, and the large number of distinct LEA transcripts (over 150) suggests possible LEA gene expansion in Xerophyta compared to desiccation-sensitive plants. Components of the PYL/SnRK2/ABF ABA-signalling pathway were also induced, although the ABF transcription factors activated in response to desiccation were most similar to those induced by drought in A. thaliana rather than seed maturation. Of the canonical seed master regulators (such as the LEC1/ABI3/FUS3/LEC2 network and ABI5) only three ABI3 transcripts were expressed, all of which encoded proteins lacking the seed motif-binding B3-domain. The results of this study suggest that vegetative desiccation tolerance in X. humilis is not associated with re-activation of seed master regulators in vegetative tissue, but may instead involve activation of seed genes by vegetative drought response regulators
Multi-Species Transcriptome Assemblies of Cultivated and Wild Lentils (Lens sp.) Provide a First Glimpse at the Lentil Pangenome
[EN] Lentils (Lens sp.) are one of the main sources of protein for humans in many regions, in part
because their rusticity allows them to withstand semi-dry climates and tolerate a wide spectrum of
pests. Both are also highly sought-after attributes to face climate change. Wild accessions, rather than
cultivated varieties, are typically the holders of most influential alleles for rusticity traits. However,
most genomic and transcriptomic research conducted in lentils has been carried out on commercial
accessions (L. culinaris), while wild relatives have been largely neglected. Herein, we assembled,
annotated, and evaluated the transcriptomes of eight lentil accessions, including the cultivated Lens
culinaris and the wild relatives: L. orientalis, L. tomentosus, L. ervoides, L. lamottei, L. nigricans, and
two L. odemensis. The assemblies allowed, for the first time, a comparison among different lentil
taxa at the coding sequence level, providing further insights into the evolutionary relationships
between cultivated and wild germplasm and suggesting a grouping of the seven accessions into
at least three conceivable gene pools. Moreover, orthologous clustering allowed a first estimation
of the lentil pan-transcriptome. It is composed of 15,910 core genes, encoded in all accessions, and
24,226 accessory genes. The different pan-transcriptome clusters were also screened for Pfam-domain
enrichment. The present study has a high novelty, as it is the first pan-transcriptome analysis using
six wild species in addition to cultivated species. Because of the amount of transcript sequences
provided, our findings will greatly boost lentil research and assist breeding efforts.S
- …