12 research outputs found

    Manual curation: 500 gene alignments

    No full text
    The alignments for 500 single copy, orthologous, nuclear genes across 21 representatives of the eupulmonates. Orthology was assessed through manual curation and gene tree assessment. Each alignment contains a mask, 'x' denotes regions that were masked out (i.e. remove from further analyses). The alignments contain dummy sequences for missing taxa

    Trinity_assemblies

    No full text
    Transcriptome assemblies for 21 eupulmonate species. The transcriptomes were assembled using the program Trinity

    Camaenidae alignment

    No full text
    The concatenated alignment of the 2,648 exons which were sequenced from representatives of the family Camaenidae using exon capture. This alignment was used to produce the camaenidae phylogeny presented in the paper

    Camaenidae_exon_capture_probe_set

    No full text
    This file contains the probes for the Camaenidae exon capture design. These probes target exons from 490 orthologous genes. The probes were designed for use with the Mycroarray Mybaits custom kit which consists of 120 bp RNA probes

    Estimation of similarity between metagenome samples.

    No full text
    <p>We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, <i>et al.</i> [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref035" target="_blank">35</a>] and compare our kWIP result (“kWIP”) with the results as presented by Edwards, <i>et al.</i> (“Weighted UniFrac” and “UniFrac”). We find that kWIP replicates their observations of stratification of root-associated microbiomes by rhizo-compartment (PC1) and experiment site (PC2).</p

    Overview of the weighted inner product metric as implemented in kWIP.

    No full text
    <p>(A) <i>k</i>-mers are counted into sketches (using khmer [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref028" target="_blank">28</a>]). Columns represent the “bins” in each sketch. The frequencies of non-zero counts across a set of sketches is computed, forming the population frequency sketch (denoted <i>F</i>). We calculate Shannon entropy of this frequency sketch as the weight vector for the WIP metric (denoted <i>H</i>, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.e009" target="_blank">Eq 2</a>). (B) Illustration of Shannon Entropy as used in kWIP: the relationship between the population frequency (<i>F</i>) and the weight (<i>H</i>).</p

    'Agalma equivalent' alignments

    No full text
    The alignments representing a subset of the output of Agalma, run on 21 eupulmonate transcriptomes. This subset is the 635 orthologous clusters identified by the automated pipeline Agalma, which correspond to the 500 nuclear single copy, orthologous genes identified by manual curation. The alignments contain dummy sequences for missing taxa

    'Agalma best' alignments

    No full text
    The alignments representing a subset of the output of Agalma, run on 21 eupulmonate transcriptomes. This subset is the 546 orthologous clusters identified by Agalma, where each orthologous cluster was the only one produced from the respective homolog cluster and had sequences for at least 18 taxa. The alignments contain dummy sequences for missing taxa

    The effect of (A) mean sequencing depth (genome coverage) and (B) average number of nucleotide differences per site (<i>π</i>) on accuracy of genetic similarity estimates in simulations.

    No full text
    <p>We plot mean ± standard deviation of Spearman’s <i>ρ</i> comparing each metric to known truth across 20 replicate runs. (A) Mean sequencing depth varies while average number of nucleotide differences per site (<i>π</i>) is constant at 0.005. kWIP: At low to moderate mean sequencing depth (<30x) weighting increases accuracy. The weighted metric (“WIP”) obtains near-optimal accuracy already at 10x and hence much earlier than the unweighted metric “IP”). There is no noticeable decrease in accuracy with increasing coverage. mash: regardless of error correction, mash performs less well than WIP. mash shows accuracy maxima at 4x coverage without (“Mash”) and at 16x coverage with abundance filter (“Mash (AF)”), at which point Mash (AF) performs almost as well as WIP. The accuracy of mash decreases dramatically when coverage is further increased. (B) Genome coverage is kept constant at 8x and average number of nucleotide differences per site (<i>π</i>) varies. While all metrics perform equally at a (<i>π</i>) of 1 in 100 (0.01), the performance of IP, Mash and Mash (AF) decreases rapidly as (<i>π</i>) between samples decreases. This does not occur for the weighted metric (WIP).</p
    corecore