38 research outputs found

    Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation. [version 1; peer review: 2 approved, 1 approved with reservations]

    Get PDF
    Background: Advancements in DNA sequencing technology have transformed the field of bacterial genomics, allowing for faster and more cost effective chromosome level assemblies compared to a decade ago. However, transforming raw reads into a complete genome model is a significant computational challenge due to the varying quality and quantity of data obtained from different sequencing instruments, as well as intrinsic characteristics of the genome and desired analyses. To address this issue, we have developed a set of container-based pipelines using Nextflow, offering both common workflows for inexperienced users and high levels of customization for experienced ones. Their processing strategies are adaptable based on the sequencing data type, and their modularity enables the incorporation of new components to address the community’s evolving needs. Methods: These pipelines consist of three parts: quality control, de novo genome assembly, and bacterial genome annotation. In particular, the genome annotation pipeline provides a comprehensive overview of the genome, including standard gene prediction and functional inference, as well as predictions relevant to clinical applications such as virulence and resistance gene annotation, secondary metabolite detection, prophage and plasmid prediction, and more. Results: The annotation results are presented in reports, genome browsers, and a web-based application that enables users to explore and interact with the genome annotation results. Conclusions: Overall, our user-friendly pipelines offer a seamless integration of computational tools to facilitate routine bacterial genomics research. The effectiveness of these is illustrated by examining the sequencing data of a clinical sample of Klebsiella pneumoniae

    Diversity of soil fungal communities of Cerrado and its closely surrounding agriculture Welds

    No full text
    Cerrado is a savanna-like region that covers a large area of Brazil. Despite its biological importance, the Cerrado has been the focus of few microbial diversity studies. A molecular approach was chosen to characterize the soil fungal communities in four areas of the Cerrado biome: a native Cerrado, a riverbank forest, an area converted to a soybean plantation, and an area converted to pasture. Global diversity of fungal communities in each area was assessed through Ribosomal intergenic spacer analysis which revealed remarkable diVerences among the areas studied. Sequencing of approximately 200 clones containing 18S rDNA sequences from each library was performed and, according to the genetic distance between sequences, these were assigned to operational taxonomic units (OTUs). A total of 75, 85, 85, and 70 OTUs were identiWed for the native Cerrado, riverbank forest, pasture, and soybean plantation, respectively. Analysis of sequences using a similarity cutoV value of 1% showed that the number of OTUs for the native Cerrado area was reduced by 35%; for the soybean plantation, a reduction by more than 50% was observed, indicating a reduction in fungal biodiversity associated with anthropogenic activity. This is the Wrst studydemonstrating the anthropogenic impact on Cerrado soil fungal diversity

    Genome-wide correspondence of DArT markers and predicted gene models in the <i>Eucalyptus grandis</i> genome.

    No full text
    <p>The 11 pseudochromosomes of the <i>Eucalyptus grandis</i> genome (Version 1.0 in Phytozome 6.0), were partitioned into 122 bins of 5 Mbp. For each bin the numbers of DArT marker probe positions (blue bars), the number of genetically mapped DArT markers (red bars) and the number of predicted gene models (green bars) were plotted.</p

    Correlations between DArT markers probes, mapped DArT markers and gene models.

    No full text
    <p>Spearman Rank correlations were estimated between: (A) the number of DArT marker probes and the number of gene models; and (B) the number of mapped DArT markers and the number of gene models, for every 5 Mbp genome bin.</p

    Mapping statistics of the DArT/microsatellite consensus maps of <i>Eucalyptus grandis x E. urophylla.</i>

    No full text
    a<p>Full map: all markers mapped at relaxed support for order.</p>b<p>Framework map: markers ordered with higher statistical support.</p>c<p>Framework to genome map: framework markers were positioned onto the assembled <i>Eucalyptus grandis</i> genome sequence to provide a correspondence between physical distance and recombination fraction for each pseudochromosome and at the genome-wide level.</p

    Framework DArT/microsatellite linkage map for <i>Eucalyptus</i>.

    No full text
    <p>The map includes 1,029 markers positioned with high confidence for locus order, involving 861 DArT (in black) and 168 microsatellites (in red) with a centiMorgan scale on the left.</p

    Alignment of the Framework map to the <i>Eucalyptus grandis</i> reference genome.

    No full text
    <p>Correspondence of the DArT and microsatellite marker positions on the Framework linkage map (green bars) with their location on the 11 <i>Eucalyptus grandis</i> pseudochromosome scaffolds (white bars). The scale on the left corresponds simultaneously to centiMorgan distances for the linkage map and to Mbp of sequence for the pseudochromosome scaffolds.</p

    Transcriptome Analysis in Cotton Boll Weevil (<i>Anthonomus grandis</i>) and RNA Interference in Insect Pests

    Get PDF
    <div><p>Cotton plants are subjected to the attack of several insect pests. In Brazil, the cotton boll weevil, <i>Anthonomus grandis</i>, is the most important cotton pest. The use of insecticidal proteins and gene silencing by interference RNA (RNAi) as techniques for insect control are promising strategies, which has been applied in the last few years. For this insect, there are not much available molecular information on databases. Using 454-pyrosequencing methodology, the transcriptome of all developmental stages of the insect pest, <i>A. grandis</i>, was analyzed. The <i>A. grandis</i> transcriptome analysis resulted in more than 500.000 reads and a data set of high quality 20,841 contigs. After sequence assembly and annotation, around 10,600 contigs had at least one BLAST hit against NCBI non-redundant protein database and 65.7% was similar to <i>Tribolium castaneum</i> sequences. A comparison of <i>A. grandis</i>, <i>Drosophila melanogaster</i> and <i>Bombyx mori</i> protein families’ data showed higher similarity to dipteran than to lepidopteran sequences. Several contigs of genes encoding proteins involved in RNAi mechanism were found. PAZ Domains sequences extracted from the transcriptome showed high similarity and conservation for the most important functional and structural motifs when compared to PAZ Domains from 5 species. Two SID-like contigs were phylogenetically analyzed and grouped with <i>T. castaneum</i> SID-like proteins. No RdRP gene was found. A contig matching chitin synthase 1 was mined from the transcriptome. dsRNA microinjection of a chitin synthase gene to <i>A. grandis</i> female adults resulted in normal oviposition of unviable eggs and malformed alive larvae that were unable to develop in artificial diet. This is the first study that characterizes the transcriptome of the coleopteran, <i>A. grandis</i>. A new and representative transcriptome database for this insect pest is now available. All data support the state of the art of RNAi mechanism in insects.</p> </div
    corecore