37 research outputs found

    VALIDATING APPROACHES FOR STUDYING MICROBIAL DIVERSITY TO CHARACTERIZE COMMUNITIES FROM ROOTS OF Populus deltoides

    Get PDF
    Microbial (archaeal, bacterial, and fungal) communities associated with plant roots are central to its health, survival, and growth. However, a robust understanding of root microbiota and the factors that govern their community structure and dynamics have remained elusive, especially in mature perennial plants from natural settings. Although the advent of Next Generation Sequencing (NGS) technologies have changed the scale of microbial ecological studies by enabling exhaustive characterization of microbial communities, the accuracy of taxonomic and quantitative inferences are affected by multiple experimental and computational steps and lack of knowledge of the true ecological diversity. To test for inaccuracies and biases, I assembled diverse bacterial and archaeal ‘synthetic communities’ from genomic DNAs of sequenced organisms. I tested and compared different approaches that included metagenomic and small subunit rRNA (SSU rRNA) amplicon sequencing. The outcome was dependent on primer pairs, analysis parameters, and sequencing platforms. Nevertheless, new approaches in processing and classifying amplicons were able to recapitulate microbial diversity with high reproducibility within primer sets, even though all tested primers sets showed taxon-specific biases. Consequently, inferences from ‘synthetic communities’ study were implemented in experimental design and analysis of microbial communities from roots of naturally occurring mature riparian plants of Populus deltoides. Thaumarchaeota, Proteobacteria and Ascomycota dominated the overall archaeal, bacterial, and fungal communities respectively. Further, I investigated relationships of bacterial and fungal communities in rhizosphere and endosphere with soil and environmental properties, host genotype, season, and geographic setting. The variation of bacterial and fungal communities between each sampled roots were explained on the basis of seasonal, soil properties, and geographical settings (4% to 23%), however, most variations remain unexplained. I also tested if rhizosphere of P. deltoides and mature trees in general select for higher diversity of archaea than surrounding soil. I discovered a slightly higher diversity of archaea in the trees compared to corresponding bulk soil, but the results were not specific to P. deltoides. In summary, this dissertation validates current microbial diversity approaches, characterizes microbial communities of an important plant, and decipher drivers that are controlling root associated community structure

    A Comprehensive Benchmarking Study of Protocols and Sequencing Platforms for 16s Rrna Community Profiling

    Get PDF
    In the last 5 years, the rapid pace of innovations and improvements in sequencing technologies has completely changed the landscape of metagenomic and metagenetic experiments. Therefore, it is critical to benchmark the various methodologies for interrogating the composition of microbial communities, so that we can assess their strengths and limitations. The most common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene and in the last 10 years the field has moved from sequencing a small number of amplicons and samples to more complex studies where thousands of samples and multiple different gene regions are interrogated. Results: Weassembled2syntheticcommunitieswithaneven(EM)anduneven(UM)distributionofarchaealand bacterial strains and species, as metagenomic control material, to assess performance of different experimental strategies. The 2 synthetic communities were used in this study, to highlight the limitations and the advantages of the leading sequencing platforms: MiSeq (Illumina), The Pacific Biosciences RSII, 454 GS-FLX/+ (Roche), and IonTorrent (Life Technologies). We describe an extensive survey based on synthetic communities using 3 experimental designs (fusion primers, universal tailed tag, ligated adaptors) across the 9 hypervariable 16S rDNA regions. We demonstrate that library preparation methodology can affect data interpretation due to different error and chimera rates generated during the procedure. The observed community composition was always biased, to a degree that depended on the platform, sequenced region and primer choice. However, crucially, our analysis suggests that 16S rRNA sequencing is still quantitative, in that relative changes in abundance of taxa between samples can be recovered, despite these biases. Conclusion: Wehaveassessedarangeofexperimentalconditionsacrossseveralnextgenerationsequencing platforms using the most up-to-date configurations. We propose that the choice of sequencing platform and experimental design needs to be taken into consideration in the early stage of a project by running a small trial consisting of several hypervariable regions to quantify the discriminatory power of each region. We also suggest that the use of a synthetic community as a positive control would be beneficial to identify the potential biases and procedural drawbacks that may lead to data misinterpretation. The results of this study will serve as a guideline for making decisions on which experimental condition and sequencing platform to consider to achieve the best microbial profiling

    NCBI’s virus discovery codeathon: building “FIVE” —the Federated Index of Viral Experiments API index

    Get PDF
    Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE

    NCBI's Virus Discovery Hackathon:Engaging Research Communities to Identify Cloud Infrastructure Requirements

    Get PDF
    A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon

    Supplementary data for Shakya et al. (2017)

    No full text
    This data set contains sequences, sequence alignments and phylogenetic trees used in the bioinformatic analyses presented in:<br><br>Shakya M, Soucy SM, and Zhaxybayeva O. "Insights into Origin and Evolution of α-proteobacterial Gene Transfer Agents", submitted. <br><br><div><b>File Contents:<br></b></div><div><br></div><div><b>Supplementary_Figures_final.pdf: </b>Supplementary Figures S1-S9 referred to in the manuscript.</div><div><br></div><div><b>SupplementaryTables.pdf</b> and <b>SupplementaryTables.xlsx</b>: Supplementary Tables S1-S5 referred to in the manuscript.<br></div><b><br></b><b><b>GTA_Rhodobacterales_queries.zip</b>: </b>FASTA-formatted files of RcGTA homologs from <i>Rhodobacterales</i> that were used in BLAST searches of <i>RefSeq</i> database and 255 α-proteobacterial genomes. <b><br><br>RefSeq_bacterial_hits.zip:</b> FASTA-formatted files of detected bacterial homologs of RcGTA genes in RefSeq database release 76. The filenames correspond to gene names listed in Supplementary Table S4.<br><br> <p> </p> <p><b>RefSeq_viral_hits.zip:</b> FASTA-formatted files of detected viral homologs of RcGTA genes within RefSeq database release 76. The filenames correspond to gene names listed in Supplementary Table S4. <br></p><p><br></p><p> <b>StructuralClusterHomologs.xlsx: </b>An Excel spreadsheet with information about RcGTA homologs found in small clusters (SC) and large clusters (LC) across α-proteobacterial genomes. The table contains the GI and accession numbers of each homolog, as well as accession number and taxonomic information of the source genome. <br></p><p> </p><p><b><br></b></p><p><b>SC_and_LC_homologs_per_genome.zip: </b>FASTA-formatted files of RcGTA structural cluster homologs identified during the screen of 255 fully sequenced α-proteobacterial genomes. Each file represents an individual cluster found within a genome, and name of the file contains the source genome name, genome accession number and type of cluster (LC or SC). Within file, definition line of each FASTA header is augmented with the type of cluster (SC or LC) and RcGTA gene name of the homolog (see first column of Supplementary Table 4 for notations).</p><p> <br></p><p> <b>individual_proteins_fa.zip</b>: FASTA-formatted sets of individual RcGTA structural cluster genes and their large cluster (LC) homologs used to create the LC-locus alignment. The filenames correspond to gene names listed in Supplementary Table S4. <br></p><p><br></p><p><b>individual_proteins_aln.zip</b>: FASTA-formatted alignments of individual RcGTA structural cluster genes and their large cluster (LC) homologs used to create the LC-locus alignment. The filenames correspond to gene names listed in Supplementary Table S4. <br></p><p><br></p><p><b>individual_trees.zip</b>: NEWICK-formatted phylogenetic trees reconstructed from the alignments in individual_protein.zip file. These trees were used in analyses shown in Supplementary Table S3. <br></p><p><br></p><p><b>LC_locus.zip</b>: FASTA-formatted LC-locus alignment and NEWICK-formatted phylogenetic tree of the LC-locus (the right panel of Figure 6). </p><p><br></p><p><b>PPD.zip: </b> Pairwise phylogenetic distances (PPDs) of RcGTA homologs found in large clusters (LC), small clusters (SC), and viruses in tab-delimited text files, and FASTA-formatted alignments of RcGTA homologs used to calculate the PPDs. The data are shown in Supplementary Figure S4. </p><p><br></p> <p> </p><p><b>flanking_genes.zip</b>: FASTA-formatted alignments and NEWICK-formatted phylogenetic trees of three genes that were found flanking large clusters detected in non-alpha-proteobacterial genomes. The trees are shown in Supplementary Figure S8.</p><p><br></p><p><a> </a></p><p><b>reference_tree.zip: </b>PHYLIP-formatted<b> </b>concatenated alignment of 99 alignments of genes conserved across<b> </b>α-proteobacteria (see Supplementary Table S2), and NEWICK-formatted phylogenetic trees reconstructed using this alignment (see Figure 6 and Supplementary Figure S3.)</p><p><br></p><br><p> </p> <p><b> </b></p

    Rhizospheric mycobiota associated with \u3cem\u3ePopulus deltoides\u3c/em\u3e

    No full text
    Populus deltoides is a common riparian tree species in southeastern North America. Populus forms root associations with arbuscular, ectomycorrhizal, and endophytic fungi. To elucidate the structure of rhizospheric fungal assemblages on Populus deltoides we carried out a series of field campaigns in natural P. deltoides populations in NC and TN. Field studies were coupled with trap-plant experiments using cuttings of Populus. Fungal root communities were characterized through 454 amplicon pyrosequencing. Efforts were also made to culture fungi from Populus roots and characterize the arbuscular mycorrhizal community through spore studies. Our results indicate that in addition to hosting mycorrhizal taxa, P. deltoides supports a high diversity of fungal root endophytes. In fact, endophytic fungi accounted for the majority of sequences in field and trap-plant samples, and pure culture isolates from roots. There was considerable overlap between datasets. Well represented were ascomycete taxa belonging to the Hypocreales, Chaetothyriales, Pleosporales and Heliotiales, and basidiomycete taxa belonging to the Agaricales, Polyporales, and Atractiellales. Spore and molecular data indicate that Glomerales and Paraglomales are the main arbuscular mycorrhizal associates of P. deltoides. Although fruitbodies of ectomycorrhizal taxa were uncommon in the field, a new truffle species was collected under P. deltoides in TN and NC field sites and has been described (Tuber mexiusanum). Additionally, fruitbodies of Laccaria, Cortinarius, and Geopora spp. were produced in our trap-plant experiments. Other ectomycorrhizal taxa recovered in our molecular surveys included Inocybe, Hebeloma, Thelephora, and Russula spp. Pure cultures of Laccaria and Thelephora spp. have been established

    Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities

    No full text
    Next-generation sequencing has dramatically changed the landscape of microbial ecology, large-scale and in-depth diversity studies being now widely accessible. However, determining the accuracy of taxonomic and quantitative inferences and comparing results obtained with different approaches are complicated by incongruence of experimental and computational data types and also by lack of knowledge of the true ecological diversity. Here we used highly diverse bacterial and archaeal synthetic communities assembled from pure genomic DNAs to compare inferences from metagenomic and SSU rRNA amplicon sequencing. Both Illumina and 454 metagenomic data outperformed amplicon sequencing in quantifying the community composition, but the outcome was dependent on analysis parameters and platform. New approaches in processing and classifying amplicons can reconstruct the taxonomic composition of the community with high reproducibility within primer sets, but all tested primers sets lead to significant taxon-specific biases. Controlled synthetic communities assembled to broadly mimic the phylogenetic richness in target environments can provide important validation for fine-tuning experimental and computational parameters used to characterize natural communities
    corecore