15 research outputs found

    Additional file 1: Figure S1. of ezTree: an automated pipeline for identifying phylogenetic marker genes and inferring evolutionary relationships among uncultivated prokaryotic draft genomes

    No full text
    The comparison of trees built for the set of Proteobacteria genomes provided by FastTree. Figure S2. The comparison of trees built for the set of Myxococcales genomes using different models provided by FastTree. Table S1. List of Proteobacteria genomes and their NCBI accession numbers used in the evaluation of ezTree. Table S2. List of Syntrophobacterales genomes and NCBI accession numbers used in inferring the tree for Smithella sp. SDB. Table S3. Single-copy marker genes identified for Syntrophobacterales genomes. Table S4. List of Methanomicrobia genomes and NCBI accession numbers used in inferring the tree for Methanoculleus sp. SDB, Methanolinea sp. SDB, and Methanosaeta sp. SDB. Table S5. Single-copy marker genes identified for Methanomicrobia genomes. Table S6. List of Myxococcales genomes and NCBI accession numbers used in inferring the tree for Sorangiineae bacterium NIC37A_2. Table S7. Single-copy marker genes identified for Myxococcales. (PDF 827 kb

    List of selected CRISPRs discussed in the paper.

    No full text
    a<p>The IDs of the CRISPRs are assigned using the following rules: 1) If a CRISPR (<i>e.g.</i>, SmutaL36) is identified from a known complete/draft genome with species name (for SmutaL36, the genome is <i>Streptococcus mutans</i> NN2025), its ID uses five letters from the species name (<i>i.e.</i>, Smuta) followed by the length of the repeats (length of 36 is shown as L36); 2) If a CRISPR (Neis_t014_L28) is identified from a known complete/draft genome that has only general genus information (<i>e.g.</i>, <i>Neisseria sp</i>. oral taxon 014 str. F0314), then its ID is four letters from the genus name, followed by the taxon ID, and the length of the repeats; and 3) the CRISPRs identified in the HMP datasets are named as the ID of the datasets followed by the length of repeat.</p

    Traces of viral sequences in the streptococcal CRISPRs in human microbiomes.

    No full text
    <p>(A) A two-way clustering of viral genomes and the HMP datasets based on the presence patterns of viral sequences in the CRISPR loci identified in the HMP datasets: the columns are the viral genomes, and the rows are HMP datasets. It shows that the genome of <i>Streptococcus phage</i> PH10 (NC_012756) has the most regions that are similar to the spacers in streptococcal CRISPRs. This figure was prepared using the heatmap function in R, with the default clustering method (hclust) and distance measure (Euclidean). (B) Mapping of the spacers onto the 31,276 base genome of <i>Streptococcus phage</i> PH10; in this figure, each vertical line shows a potential proto-spacer, a region in the virus genome that is similar to a spacer found in HMP datasets; lines of the same color show sets of proto-spacers identified from the same HMP dataset (other individual proto-spacers are shown in gray lines); the ORFs are shown in arrows (the red arrow is an integrase and the green arrow is annotated as endolysin).</p

    A potentially novel CRISPR array identified in a contig (9848 bases) from sample SRS012279.

    No full text
    <p>(A) This CRISPR array has 6 copies of the repeat (repeat sequences shown in red font and spacer shown in blue). (B) shows our annotation of this contig, in which the CRISPR array is highlighted in red. We first predicted ORFs in this contig using FragGeneScan <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002441#pgen.1002441-Rho1" target="_blank">[31]</a>, and then blasted predicted proteins against the nr protein database to retrieve annotations; for example, the predicted Cas1 is similar to the Cas1 protein identified in <i>Leptotrichia buccalis</i> C-1013-b (accession ID: YP_003163976), with 60% sequence identify and 80% sequence similarity.</p

    Distribution of selected CRISPRs across body sites.

    No full text
    a<p>the total number of datasets;</p>b<p>the total number of datasets that have CRISPRs identified;</p>c<p>L-Retroauricular crease;</p>d<p>R-Retroauricular crease. Note not all body sites are listed in this table.</p

    Comparison of CRISPR identification using whole-metagenome assembly and targeted assembly.

    No full text
    a<p>the total number of samples that have streptococcal CRISPRs identified if using targeted assembly, and</p>b<p>if using whole-metagenome assembly;</p>c<p>the total number of spacers found in the longest CRISPR locus found in the given dataset;</p>d<p>the total number of spacers found in all contigs assembled from the given dataset;</p>e<p>the total number of sequences that contain the repeats of a given CRISPR, <i>i.e.</i>, the recruited reads used for targeted assembly. See <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002441#pgen.1002441.s008" target="_blank">Table S1</a> for comparison of all the CRISPRs studied in this paper.</p

    Sharing of streptococcal CRISPR spacers among samples from 6 individuals.

    No full text
    <p>In this map, the rows are the 761 spacers (clustered at 98% identify) identified in one or more of these 6 individuals, and the columns are samples (<i>e.g.</i>, Stool_v1_p1 indicates a sample from stool of individual 1, in visit 1; Tongue_v2_p1 indicates dataset from tongue, individual 1, in visit 2). Buccal stands for buccal mucosa, and SupraPlaque stands for supragingival plaque. The red lines indicate the presence of spacers in each of the samples. Multiple lines in the same row represent a spacer that is shared by multiple samples.</p
    corecore