29 research outputs found

    CAMERA Fragment Recruitment Viewer

    No full text
    <p>This tool graphically displays the results of a BLASTN sequence comparison of an available microbial genome against selected sequence read datasets. The example shown displays the abundance and distribution of Synechococcus spp. genome sequence in the selected sampling sites. The Synechococcus spp. genome coordinates are shown on the <i>x</i>-axis, while the <i>y</i>-axis shows the percent identity scores of the alignment to the selected Sargasso Sea and GOS sequence reads. The viewer incorporates metadata associated with the reads, allowing a user to quickly identify data of interest for further examination. The utility of the plot is to examine the biogeography and genomic variation of abundant microbes when a close reference genome exists.</p

    Wu_2011_Data

    No full text
    Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dongying Wu, Martin Wu, Aaron Halpern, Doug Rusch, Shibu Yooseph, Marvin Frazier, J. Craig Venter, Jonathan A. Eisen. Supplementary Data: (1) recA data: recA.tgz. recA.tgz contains the following files: recA_GOS.pep -- Amino acid sequences for GOS RecAs, recA_ref.pep -- Amino acid sequences for RecAs from NRAA and genome sequences, recA_cluster.txt -- Lek clusters of RecA sequences (Table 1), recA.ali -- Original alignment for the RecA tree (Figure 1), recA.trim.ali -- Trimmed RecA alignment that the RecA tree is built upon (Figure 1), recA.tre -- RecA tree in Newick format (Figure 1), recA_GOS_pepID_assemblyID.map -- The assembly IDs of the recA encoding GOS assemblies (Table 2), recA_linked.pep -- The Amino Acid sequences of the genes that share assemblies with the GOS novel recA (Table 2). (2) rpoB data: rpoB.tgz. rpoB.tgz contains the following files: rpoB_GOS.pep -- Amino acid sequences for GOS RpoBs, rpoB_ref.pep -- Amino acid sequences for RpoBs from NRAA and genome sequences, rpoB_cluster.txt -- Lek clusters of RpoB sequences (Table 3), rpoB.tre.ali -- Original alignment for the RpoB tree (Figure 3), rpoB.tre.trim -- Trimmed RpoB alignment that the RpoB tree is built upon (Figure 3), rpoB.tre -- RpoB tree in Newick format (Figure 3). (3) ss-rRNA data: ssu.tgz. ssu.tgz contains the following files: SSU_GOSreads.fa -- GOS ss-rRNA sequences, SSU_GOSreads_deepbrach.fa -- Potential GOS deep-branching ss-rRNA. (4) Lek Clustering Program: lek.tgz. lek.tgz contains scripts for the Lek clustering protocol. Instructions can be found in the included README file

    Genes linked to sequences in the novel RecA subfamilies.

    No full text
    <p>Five RecA subfamilies were identified as being novel (i.e., only seen in metagenomic data) in our initial analyses. GOS metagenome assemblies that encode members of these subfamilies were identified and the genes neighboring the novel RecAs were characterized. The neighboring gene descriptions are based on the top BLASTP hits against the NRAA database; taxonomy assignments are based on their closest neighbor in phylogenetic trees built from the top NRAA BLASTP hits.</p

    Phylogenetic tree of the RpoB superfamily.

    No full text
    <p>All RpoB sequences were grouped into clusters using the Lek algorithm. Representatives of each cluster that contained >2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RpoB superfamily are shaded and given a name on the right. The two novel RpoB clades that contain only GOS sequences are highlighted by the colored panels.</p

    RpoB subfamilies.

    No full text
    <p>A Lek clustering method was applied to all RpoB superfamily members retrieved from the NRAA database, microbial genome projects, and the GOS data set. Clusters that contain only sequences from the GOS data set are noted as “From GOS only.”</p><p>*Clusters 1, 9, 10, 11, 15, and 16 contain only sequence fragments from the GOS data set; though possibly novel they were omitted from further analysis.</p><p>**Cluster 5 contains only two sequences. Though both are from the GOS (IDs 1096695464231 and 1096681823525) and may represent a novel RpoB subfamily, this group was excluded from further analysis because we restricted analyses to groups with three or more sequences.</p

    The largest assembly from the GOS data that encodes a novel RecA subfamily member (a representative of subfamily Unknown 2).

    No full text
    <p>This GOS assembly (ID 1096627390330) encodes 33 annotated genes plus 16 hypothetical proteins, including several with similarity to known archaeal genes (e.g., DNA primase, translation initiation factor 2, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018011#pone-0018011-t002" target="_blank">Table 2</a>). The arrow indicates a novel <i>recA</i> homolog from the Unknown 2 subfamily (cluster ID 9).</p

    Phylogenetic tree of the RecA superfamily.

    No full text
    <p>All RecA sequences were grouped into clusters using the Lek algorithm. Representatives of each cluster that contained >2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RecA superfamily are shaded and given a name on the right. Five of the proposed subfamilies contained only GOS sequences at the time of our initial analysis (RecA-like SAR, Phage SAR1, Phage SAR2, Unknown 1 and Unknown 2) and are highlighted by colored shading. As noted on the tree and in the text, sequences from two Archaea that were released after our initial analysis group in the <b>Unknown 2 subfamily.</b></p

    RecA superfamily clusters.

    No full text
    <p>A Lek protein clustering method was applied to all RecA superfamily members retrieved from the NRAA database, microbial genomes, and the GOS data set. The 23 clusters containing more than two sequences are listed. Clusters that contain only sequences from the GOS data set are noted as “GOS only.” When a cluster can be mapped to a RecA subfamily identified by Lin <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018011#pone.0018011-Lin1" target="_blank">[53]</a>, the family designation from that paper is shown in column 3.</p><p>*These clusters of RecA fragments from the GOS data set were not included in the phylogenetic tree (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018011#pone-0018011-g001" target="_blank">Figure 1</a>).</p><p>**Although cluster 9 contained only GOS sequences at the time of the initial analysis, it was subsequently found to include marine archaeal homologs from more recent genome sequencing projects.</p

    GOS-Only Clusters Are Enriched for Sequences of Viral Origin Independently of the Kingdom Assignment Method Employed

    No full text
    <p>For each panel, clusters are as in <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050016#pbio-0050016-g004" target="_blank">Figure 4</a>. For (A–C), a kingdom is assigned to each neighboring ORF within each cluster set; the percentage of all neighboring ORFs with a given kingdom assignment is plotted. For (D–F), a kingdom is assigned to each cluster if more than 50% of all that cluster's neighbors with a kingdom assignment share the same assignment; the percentage of clusters in each set with a given assignment is plotted. In (A) and (D), a kingdom is assigned to a neighboring ORF by a majority vote of the top four BLAST matches to a protein in NCBI-nr (<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050016#s3" target="_blank">Materials and Methods</a>). In (B) and (E), a kingdom is assigned if all eight highest-scoring BLAST matches agree in kingdom. In (C) and (F), all ORFs on a scaffold are assigned the same kingdom by voting among all ORFs with BLAST matches to NCBI-nr on that scaffold (<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050016#s3" target="_blank">Materials and Methods</a>). In all graphs, only clusters with at least one assignable neighbor are considered. When compared to the size-matched controls, in all cases the GOS-only clusters show enrichment for viral sequences.</p
    corecore