12 research outputs found
Manual curation: 500 gene alignments
The alignments for 500 single copy, orthologous, nuclear genes across 21 representatives of the eupulmonates. Orthology was assessed through manual curation and gene tree assessment. Each alignment contains a mask, 'x' denotes regions that were masked out (i.e. remove from further analyses). The alignments contain dummy sequences for missing taxa
Trinity_assemblies
Transcriptome assemblies for 21 eupulmonate species. The transcriptomes were assembled using the program Trinity
Camaenidae alignment
The concatenated alignment of the 2,648 exons which were sequenced from representatives of the family Camaenidae using exon capture. This alignment was used to produce the camaenidae phylogeny presented in the paper
Camaenidae_exon_capture_probe_set
This file contains the probes for the Camaenidae exon capture design. These probes target exons from 490 orthologous genes. The probes were designed for use with the Mycroarray Mybaits custom kit which consists of 120 bp RNA probes
Estimation of similarity between metagenome samples.
<p>We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, <i>et al.</i> [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref035" target="_blank">35</a>] and compare our kWIP result (“kWIP”) with the results as presented by Edwards, <i>et al.</i> (“Weighted UniFrac” and “UniFrac”). We find that kWIP replicates their observations of stratification of root-associated microbiomes by rhizo-compartment (PC1) and experiment site (PC2).</p
Overview of the weighted inner product metric as implemented in kWIP.
<p>(A) <i>k</i>-mers are counted into sketches (using khmer [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref028" target="_blank">28</a>]). Columns represent the “bins” in each sketch. The frequencies of non-zero counts across a set of sketches is computed, forming the population frequency sketch (denoted <i>F</i>). We calculate Shannon entropy of this frequency sketch as the weight vector for the WIP metric (denoted <i>H</i>, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.e009" target="_blank">Eq 2</a>). (B) Illustration of Shannon Entropy as used in kWIP: the relationship between the population frequency (<i>F</i>) and the weight (<i>H</i>).</p
'Agalma equivalent' alignments
The alignments representing a subset of the output of Agalma, run on 21 eupulmonate transcriptomes.
This subset is the 635 orthologous clusters identified by the automated pipeline Agalma, which correspond to the 500 nuclear single copy, orthologous genes identified by manual curation. The alignments contain dummy sequences for missing taxa
'Agalma best' alignments
The alignments representing a subset of the output of Agalma, run on 21 eupulmonate transcriptomes. This subset is the 546 orthologous clusters identified by Agalma, where each orthologous cluster was the only one produced from the respective homolog cluster and had sequences for at least 18 taxa. The alignments contain dummy sequences for missing taxa
The effect of (A) mean sequencing depth (genome coverage) and (B) average number of nucleotide differences per site (<i>π</i>) on accuracy of genetic similarity estimates in simulations.
<p>We plot mean ± standard deviation of Spearman’s <i>ρ</i> comparing each metric to known truth across 20 replicate runs. (A) Mean sequencing depth varies while average number of nucleotide differences per site (<i>π</i>) is constant at 0.005. kWIP: At low to moderate mean sequencing depth (<30x) weighting increases accuracy. The weighted metric (“WIP”) obtains near-optimal accuracy already at 10x and hence much earlier than the unweighted metric “IP”). There is no noticeable decrease in accuracy with increasing coverage. mash: regardless of error correction, mash performs less well than WIP. mash shows accuracy maxima at 4x coverage without (“Mash”) and at 16x coverage with abundance filter (“Mash (AF)”), at which point Mash (AF) performs almost as well as WIP. The accuracy of mash decreases dramatically when coverage is further increased. (B) Genome coverage is kept constant at 8x and average number of nucleotide differences per site (<i>π</i>) varies. While all metrics perform equally at a (<i>π</i>) of 1 in 100 (0.01), the performance of IP, Mash and Mash (AF) decreases rapidly as (<i>π</i>) between samples decreases. This does not occur for the weighted metric (WIP).</p