47 research outputs found

    Heat trees allow for a better understanding of community structure than stacked bar charts.

    No full text
    <p>The stacked bar chart on the left represents the abundance of organisms in two samples from the Human Microbiome Project [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005404#pcbi.1005404.ref005" target="_blank">5</a>]. The same data are displayed as heat trees on the right. In the heat trees, size and color of nodes and edges are correlated with the abundance of organisms in each community. Both visualizations show communities dominated by firmicutes, but the heat trees reveal that the two samples share no families within firmicutes and are thus much more different than suggested by the stacked bar chart.</p

    Heat trees display up to four metrics in a taxonomic context and can plot multiple trees per graph.

    No full text
    <p>Most graph components, such as the size and color of text, nodes, and edges, can be automatically mapped to arbitrary numbers, allowing for a quantitative representation of multiple statistics simultaneously. This graph depicts the uncertainty of OTU classifications from the TARA global oceans survey [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005404#pcbi.1005404.ref002" target="_blank">2</a>]. Each node represents a taxon used to classify OTUs and the edges determine where it fits in the overall taxonomic hierarchy. Node diameter is proportional to the number of OTUs classified as that taxon and edge width is proportional to the number of reads. Color represents the percent of OTUs assigned to each taxon that are somewhat similar to their closest reference sequence (>90% sequence identity). <b>a.</b> Metazoan diversity in detail. <b>b.</b> All taxonomic diversity found. Note that multiple trees are automatically created and arranged when there are multiple roots to the taxonomy.</p

    Primary functions found in <i>metacoder</i>.

    No full text
    <p>Primary functions found in <i>metacoder</i>.</p

    Another alternate use example: Visualizing gene expression data in a GO hierarchy.

    No full text
    <p>The gene ontology for all differentially expressed genes in a study on the effect of a glucocorticoid on airway smooth muscle tissue [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005404#pcbi.1005404.ref019" target="_blank">19</a>]. Color indicates the sign and intensity of averaged changes in gene expression and the size indicates the number of genes classified by a given gene ontology term.</p

    Flexible parsing and digital PCR allows for comparisons of primers and databases.

    No full text
    <p>Shown is a comparison of digital PCR results for three 16S reference databases. The plots on the left display abundance of all bacterial 16S sequences. Plots on the right display all taxa with subtaxa not entirely amplified by digital PCR using universal 16S primers [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005404#pcbi.1005404.ref018" target="_blank">18</a>]. Node color and size display the proportion and number of sequences not amplified respectively.</p

    <i>Metacoder</i> has an intuitive and easy to use syntax.

    No full text
    <p>The code in this example analysis parses the taxonomic data associated with sequences from the Ribosomal Database Project [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005404#pcbi.1005404.ref009" target="_blank">9</a>] 16S training set, filters and subsamples the data by sequence and taxon characteristics, conducts digital PCR, and displays the results as a heat tree. All functions in bold are from the <i>metacoder</i> package. Note how columns and functions in the taxmap object (green box) can be referenced within functions as if they were independent variables.</p

    <i>Metacoder</i> can be used with any type of data that can be organized hierarchically.

    No full text
    <p>This plot shows the results of the 2016 Democratic primary election organized by region, division, state, and county. The regions and divisions are those defined by the United States census bureau. Color corresponds to the difference in the percentage of votes for candidates Hillary Clinton (green) and Bernie Sanders (brown). Size corresponds to the total number of votes cast. Data was downloaded from <a href="https://www.kaggle.com/benhamner/2016-us-election/" target="_blank">https://www.kaggle.com/benhamner/2016-us-election/</a>.</p

    There is no association between BMI and the <i>Bacteroidetes</i>:<i>Firmicutes</i> ratio in HMP stool microbiomes.

    No full text
    <p>There is no association between BMI and the <i>Bacteroidetes</i>:<i>Firmicutes</i> ratio in HMP stool microbiomes.</p

    The between-study variability in the relative abundance of <i>Bacteroidetes</i> and <i>Firmicutes</i> is greater than the within-study differences between lean and obese individuals.

    No full text
    <p>The Ley data are from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084689#pone.0084689-Ley2" target="_blank">[6]</a>. The “Turnb.” data are from Turnbaugh et al. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084689#pone.0084689-Turnbaugh3" target="_blank">[7]</a>, from African Americans (AA) and European Americans (EA), from variable regions (V) 2 and 6. The MetaHIT data are from the Danish subjects in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084689#pone.0084689-Qin1" target="_blank">[16]</a> who do not have inflammatory bowel disease. The HMP data are from V13 and V35. We note that the primary results from this manuscript were generated using data from HMP V35. All p-values by -test.</p

    Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data

    Get PDF
    <div><p>Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs (“vFams”) to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (<a href="http://derisilab.ucsf.edu/software/vFam" target="_blank">http://derisilab.ucsf.edu/software/vFam</a>).</p></div
    corecore