54 research outputs found

    MetaWorks workflow to produce taxonomically assigned exact sequence variants.

    No full text
    To aid reproducibility, a Conda environment is provided. Although multiple Snakemake workflows are provided in MetaWorks, here we show the main workflow that generates taxonomically assigned ESVs. Input files are shown in the first panel (green), the ESV workflow is shown in the centre panel (blue), and outfiles are shown in the last panel (orange). The input files in white boxes are required by snakemake to run the appropriate workflow. The input files in green need to be supplied by the user. Note that only custom-trained classifiers such as for COI need to be supplied by the user whereas classifiers built-in to the RDP classifier are used automatically to process prokaryote 16S assignments, for example. The denoising step shown here includes the removal of rare clusters, sequences with putative errors, as well as chimeric sequences. The results are provided in a comma-separated value (CSV) file and shows each ESV per sample with read counts and taxonomic assignments. Abbreviations: Demultiplexed Illumina paired-end reads (R1 + R2), internal transcribed spacer (ITS) region, open reading frame sequences (ORFs).</p

    RDP-trained reference sets that can be used with MetaWorks.

    No full text
    RDP-trained reference sets that can be used with MetaWorks.</p

    Effect of primer choice on recovery and coverage.

    No full text
    <p>Results are shown for 200 bp fragments classified to the genus rank averaged across three methods. We used a ‘leave one out’ approach with BLAST + MEGAN and SAP. NBC was run ‘as is’ from the RDP website. Recovery (blue) and coverage (red) are shown for four primers. Bars indicate standard error of the mean using three classification methods.</p

    Taxonomic and sequence length breakdown for the ‘long’ LSU rDNA data set.

    No full text
    <p>Taxonomic and sequence length breakdown for the ‘long’ LSU rDNA data set.</p

    Schematic diagram of large subunit ribosomal DNA (LSU rDNA).

    No full text
    <p>In the top frame, the LSU rDNA region for <i>Saccharomyces cerevisiae</i> (RDN25-1) NC_001144.5: 455181-451786 is shown. In the second frame, variable sequence regions from Schnare et al. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0035749#pone.0035749-Michot3" target="_blank">[24]</a> (top) and Hassouna et al. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0035749#pone.0035749-Schnare1" target="_blank">[20]</a> (bottom) have been mapped with respect to the <i>S. cerevisiae</i> sequence. In the third frame, the position of some commonly used LSU rDNA primers are shown. In the bottom frame, the position and length of fragments simulated for this study are shown.</p

    Comparison of classification methods using short read sequences while enforcing a statistical cutoff.

    No full text
    <p>Simulated read length is shown on the x-axis. In the top row, recovery is shown on the y-axis and refers to the proportion of queries with a correct taxonomic classification. In the middle row, erroneous recovery is shown on the y-axis and refers to the proportion of queries with an incorrect taxonomic classification. In the bottom row, coverage is shown on the y-axis and refers to the proportion of queries for which a classification could be made (correct or incorrect). The results for six taxonomic ranks are shown: kingdom (blue), phylum (red), class (green), order (purple), family (teal), and genus (orange). A ‘leave one out’ search approach was used with SAP. The asterisk indicates that NBC was run ‘as is’ from the Ribosomal Database Project website. Bars indicate standard error of the mean using four primers. The default statistical cutoffs for SAP (95% neighbor joining bootstrap proportion) and NBC (50% for sequences less than 250 bp, otherwise 80% confidence) are enforced.</p

    Comparison of methods to classify ‘long’ large subunit ribosomal DNA sequences.

    No full text
    <p>Classifications at the genus (G), family (F), and order (O) ranks are shown on the x-axis. Recovery on the y-axis refers to the percentage of queries recovered with a correct classification. Results from BLAST + MEGAN and SAP are directly compared using a ‘complete’ and ‘leave one out’ search scenario. Results from SAP with the default 95% neighbor joining bootstrap cutoff enforced is also shown (SAP NJ 95). Results from NBC run ‘as is’ from the Ribosomal Database Project website are shown separately. Results from NBC with the recommended 80% confidence cutoff are also shown (NBC 80).</p

    Comparison of simulated communities using non-metric multidimensional scaling.

    No full text
    <p>The ‘reference set’ (black square) was comprised of ‘long’ large subunit ribosomal DNA sequences (about 3,000 bp average length) that were classified using MEGAN + BLAST against a complete database or classifications from NBC run ‘as is’ from the Ribosomal Database Project website and imported into MEGAN (NBC + MEGAN). Four mock communities comprised of 200 bp sequences were generated from four primers: LR0R (red), LR3 (blue), LR5 (green), and LR7 (orange). Communities were subjected to per-base error rates of 0% (square), 0.01% (circle), 0.1% (triangle), 1% (+), and 10% (×). Classifications were summarized at the order rank. Similarity of taxonomic composition was compared using Bray-Curtis dissimilarity and a simplified UniFrac measure in MEGAN.</p

    Ribosomal DNA and Plastid Markers Used to Sample Fungal and Plant Communities from Wetland Soils Reveals Complementary Biotas

    No full text
    <div><p>Though the use of metagenomic methods to sample below-ground fungal communities is common, the use of similar methods to sample plants from their underground structures is not. In this study we use high throughput sequencing of the ribulose-bisphosphate carboxylase large subunit (rbcL) plastid marker to study the plant community as well as the internal transcribed spacer and large subunit ribosomal DNA (rDNA) markers to investigate the fungal community from two wetland sites. Observed community richness and composition varied by marker. The two rDNA markers detected complementary sets of fungal taxa and total fungal composition clustered according to primer rather than by site. The composition of the most abundant plants, however, clustered according to sites as expected. We suggest that future studies consider using multiple genetic markers, ideally generated from different primer sets, to detect a more taxonomically diverse suite of taxa compared with what can be detected by any single marker alone. Conclusions drawn from the presence of even the most frequently observed taxa should be made with caution without corroborating lines of evidence.</p></div

    Rarefaction curves.

    No full text
    <p>Data are shown for 5’ and 3’ fragments sampled from two sites (A and B) for three loci: (a) ITS, (b) LSU, and (c) rbcL.</p
    • …
    corecore