27 research outputs found

    Genome Update of the Dimorphic Human Pathogenic Fungi Causing Paracoccidioidomycosis

    No full text
    <div><p>Paracoccidiodomycosis (PCM) is a clinically important fungal disease that can acquire serious systemic forms and is caused by the thermodimorphic fungal <i>Paracoccidioides</i> spp. PCM is a tropical disease that is endemic in Latin America, where up to ten million people are infected; 80% of reported cases occur in Brazil, followed by Colombia and Venezuela. To enable genomic studies and to better characterize the pathogenesis of this dimorphic fungus, two reference strains of <i>P. brasiliensis</i> (Pb03, Pb18) and one strain of <i>P. lutzii</i> (Pb01) were sequenced <a href="http://www.plosntds.org/article/info:doi/10.1371/journal.pntd.0003348#pntd.0003348-Desjardins1" target="_blank">[1]</a>. While the initial draft assemblies were accurate in large scale structure and had high overall base quality, the sequences had frequent small scale defects such as poor quality stretches, unknown bases (N's), and artifactual deletions or nucleotide duplications, all of which caused larger scale errors in predicted gene structures. Since assembly consensus errors can now be addressed using next generation sequencing (NGS) in combination with recent methods allowing systematic assembly improvement, we re-sequenced the three reference strains of <i>Paracoccidioides</i> spp. using Illumina technology. We utilized the high sequencing depth to re-evaluate and improve the original assemblies generated from Sanger sequence reads, and obtained more complete and accurate reference assemblies. The new assemblies led to improved transcript predictions for the vast majority of genes of these reference strains, and often substantially corrected gene structures. These include several genes that are central to virulence or expressed during the pathogenic yeast stage in <i>Paracoccidioides</i> and other fungi, such as <i>HSP90</i>, <i>RYP1-3</i>, <i>BAD1</i>, catalase B, alpha-1,3-glucan synthase and the beta glucan synthase target gene <i>FKS1</i>. The improvement and validation of these reference sequences will now allow more accurate genome-based analyses. To our knowledge, this is one of the first reports of a fully automated and quality-assessed upgrade of a genome assembly and annotation for a non-model fungus.</p></div

    Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

    No full text
    <div><p>Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small <i>e.g.</i>, 180 bp and large <i>e.g.</i>, 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.</p></div

    Improved consistency of gene annotation in v2 genomes.

    No full text
    <p>The final predicted gene sets of the three <i>Paracoccidioides</i> strains were clustered using OrthoMCL, in v1 and v2. The scatterplots (A) compare, for each clustered group, the maximum length versus the minimum length of the three <i>Paracoccidioides</i> genes in the same cluster, for each of the two versions. The scatterplot contrasts the maximum-minimum pairs from annotation v1 (red points) and those from annotation v2 (blue points). The location of blue points closer to the diagonal illustrates that the annotation v2 was more consistent across the three genomes with smaller differences in gene length. In the same sense, the rank plots (B) show the difference between maximum and minimum length for each clustered group, for each of the two versions; again annotation v2 (blue line) showed fewer (later increase) and smaller (more gradual increase) differences, corresponding to the improvement of the genome annotation in v2.</p

    Examples of an artifactual insertion and an artifactual deletion that were corrected during the update of the <i>P. brasiliensis</i> Pb03 genome sequence.

    No full text
    <p>Screenshots of Pilon-generated genome browser tracks in GenomeView v1.0 <a href="http://www.plosntds.org/article/info:doi/10.1371/journal.pntd.0003348#pntd.0003348-Abeel1" target="_blank">[35]</a> show the evidence used by Pilon to recognize and correct an incorrect insertion in the gene PABG_00120 (left) and an incorrect deletion in the gene PABG_00790 (right). Tracks (top panels) depict paired-end reads (green) aligned to the corresponding region of the reference assembly v1, a subset of the total depth of ∌150X or ∌170X; these alignments were used by Pilon to refine the consensus sequence, generating the improved Pb03 assembly v2. Positions in the v1 assembly where aligned reads suggest a change due to either a gap (red box) or an insertion (black line) are indicated with dashed red boxes. The changes suggested by Pilon are also supported by conservation of the changed bases in a multiple alignment (bottom panels) with the corresponding region of <i>P. brasiliensis</i> Pb18 and <i>P. lutzii</i> Pb01.</p

    Diverse error correction for the 90 kDa heat shock protein (HSP90 gene) of <i>Paracoccidioides</i> spp.

    No full text
    <p>(A) In this example different annotation errors were present in v1 of all three <i>Paracoccidioides</i> reference strains, all of which were fixed in v2 after Pilon improvement and re-annotation. The example also illustrates how one or more single-nucleotide errors, unknown single nucleotides (N's), or single nucleotides that were erroneously reported as absent or duplicated by a Sanger sequencer can amplify across annotations, generating radically different gene structure (intron/exon and/or gene boundary) predictions. (B) Five changes are shown at assembly (DNA sequence) level, one of which was a single nucleotide error in a stop codon; as a result, the gene-calling program did not recognize the end of an exon and it was not reported.</p

    Summary assembly statistics before and after Pilon improvement.

    No full text
    <p>In all cases the assemblies were more contiguous, contained more bases, and had fewer gaps and errors after Pilon improvement.</p><p>Summary assembly statistics before and after Pilon improvement.</p

    Recall and precision metrics for <i>M. tuberculosis</i> F11 variants called against <i>M. tuberculosis</i> H37Rv by Pilon (with and without long insert library data), GATK UnifiedGenotyper and SAMtools.

    No full text
    <p>The three rows marked with 'Single' indicate single nucleotide variants. The three rows marked with 'Multi' indicate variants involving two or more nucleotides, which also include very large events that span several Kb. Recall (R) is the fraction of curated events that were called by the program. Precision (P) is the fraction of calls that the program made that were also described in the curation. The F-measure is the harmonic mean of recall and precision and provides measure of the trade-off between recall and precision. “N/A” indicates that all events of this type were captured in another variant category.</p><p>Recall and precision metrics for <i>M. tuberculosis</i> F11 variants called against <i>M. tuberculosis</i> H37Rv by Pilon (with and without long insert library data), GATK UnifiedGenotyper and SAMtools.</p
    corecore