9 research outputs found

    CORE: A Phylogenetically-Curated 16S rDNA Database of the Core Oral Microbiome

    Get PDF
    Comparing bacterial 16S rDNA sequences to GenBank and other large public databases via BLAST often provides results of little use for identification and taxonomic assignment of the organisms of interest. The human microbiome, and in particular the oral microbiome, includes many taxa, and accurate identification of sequence data is essential for studies of these communities. For this purpose, a phylogenetically curated 16S rDNA database of the core oral microbiome, CORE, was developed. The goal was to include a comprehensive and minimally redundant representation of the bacteria that regularly reside in the human oral cavity with computationally robust classification at the level of species and genus. Clades of cultivated and uncultivated taxa were formed based on sequence analyses using multiple criteria, including maximum-likelihood-based topology and bootstrap support, genetic distance, and previous naming. A number of classification inconsistencies for previously named species, especially at the level of genus, were resolved. The performance of the CORE database for identifying clinical sequences was compared to that of three publicly available databases, GenBank nr/nt, RDP and HOMD, using a set of sequencing reads that had not been used in creation of the database. CORE offered improved performance compared to other public databases for identification of human oral bacterial 16S sequences by a number of criteria. In addition, the CORE database and phylogenetic tree provide a framework for measures of community divergence, and the focused size of the database offers advantages of efficiency for BLAST searching of large datasets. The CORE database is available as a searchable interface and for download at http://microbiome.osu.edu

    Numbers of S-OTUs by phylum in CORE.

    No full text
    <p>Number of S-OTUs assigned to each of the 14 phyla observed in the oral cavity and pharynx. A) Common phyla B) Rare phyla (<10 S-OTUs). The fraction of S-OTUs for which a cultivated member has not been reported is indicated.</p

    Circular phylogenetic tree at level of genus.

    No full text
    <p>The tree was generated with RAxML and viewed in ITOL <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0019051#pone.0019051-Letunic1" target="_blank">[27]</a>. Genera are color-coded by phyla, except for the Firmicutes and Proteobacteria, which are shown at the level of class.</p

    Plot of the variability of the 16S gene within the oral microbiome.

    No full text
    <p>668 full-length 16S sequences selected to comprehensively represent the oral microbiome were aligned. The Shannon entropy index (H’) was calculated for each base position, and mean information entropy for primer-sized and amplicon-sized windows along the length of the sequence were plotted. Variable and conserved regions can be visualized. (Because of gaps inserted in the alignment the numbering does not correspond directly to <i>E. coli</i> numbering.)</p

    Position of 1st named match in BLAST results.

    No full text
    <p>A 1000 sequence test set of clinical sequences was BLAST searched against 4 databases. We ranked the results by sequence identity level (more appropriate than e-value because of the presence of truncated database sequences in some cases) and scanned the lists above the 98% similarity level to find the position of the 1<sup>st</sup> match that included a full Latin name (genus plus species). A) Bar graph showing the results for queries for which a named match was found in at least one of the 4 databases. B) Box and whisker plots of position of 1<sup>st</sup> named match for queries that returned a >98% identical named match for all databases. The lower limit, middle line, and upper limit of the blue box indicate the 25<sup>th</sup>, 50<sup>th</sup> and 75<sup>th</sup> percentiles of the data respectively. The whiskers are 1.5 times the inter-quartile distance, and jittered data points are shown. For CORE and HOMD, the boxes and whiskers are compressed at the 1 value because of the large number of named matches in the first result for these two databases.</p

    Cumulative distribution of clinical sequences against database entries.

    No full text
    <p>The frequency with which each of the sequences in CORE were encountered in the clinical datasets used for curation are shown as the cumulative percent of total sequences. They are ordered from most to least common. The majority of clinical sequences were accounted for by fewer than 1000 CORE entries.</p
    corecore