Search CORE

12 research outputs found

Manual curation: 500 gene alignments

Author: Adnan Moussalli (272589)
Frank Köhler (660792)
Kevin D. Murray (759583)
Luisa C. Teasdale (3311757)
Tim O'Hara (3322260)
Publication venue
Publication date: 24/05/2016
Field of study

The alignments for 500 single copy, orthologous, nuclear genes across 21 representatives of the eupulmonates. Orthology was assessed through manual curation and gene tree assessment. Each alignment contains a mask, 'x' denotes regions that were masked out (i.e. remove from further analyses). The alignments contain dummy sequences for missing taxa

Dryad Digital Repository (Duke University)

FigShare

Trinity_assemblies

Author: Adnan Moussalli (272589)
Frank Köhler (660792)
Kevin D. Murray (759583)
Luisa C. Teasdale (3311757)
Tim O'Hara (3322260)
Publication venue
Publication date: 24/05/2016
Field of study

Transcriptome assemblies for 21 eupulmonate species. The transcriptomes were assembled using the program Trinity

Dryad Digital Repository (Duke University)

FigShare

Camaenidae alignment

Author: Adnan Moussalli (272589)
Frank Köhler (660792)
Kevin D. Murray (759583)
Luisa C. Teasdale (3311757)
Tim O'Hara (3322260)
Publication venue
Publication date: 24/05/2016
Field of study

The concatenated alignment of the 2,648 exons which were sequenced from representatives of the family Camaenidae using exon capture. This alignment was used to produce the camaenidae phylogeny presented in the paper

Dryad Digital Repository (Duke University)

FigShare

Camaenidae_exon_capture_probe_set

Author: Adnan Moussalli (272589)
Frank Köhler (660792)
Kevin D. Murray (759583)
Luisa C. Teasdale (3311757)
Tim O'Hara (3322260)
Publication venue
Publication date: 24/05/2016
Field of study

This file contains the probes for the Camaenidae exon capture design. These probes target exons from 490 orthologous genes. The probes were designed for use with the Mycroarray Mybaits custom kit which consists of 120 bp RNA probes

Dryad Digital Repository (Duke University)

FigShare

Computational performance of kWIP.

Author: Cheng Soon Ong (216284)
Christfried Webers (4413805)
Justin Borevitz (3384413)
Kevin D. Murray (759583)
Norman Warthmann (79177)
Publication venue
Publication date
Field of study

Computational performance of kWIP.</p

FigShare

Estimation of similarity between metagenome samples.

Author: Cheng Soon Ong (216284)
Christfried Webers (4413805)
Justin Borevitz (3384413)
Kevin D. Murray (759583)
Norman Warthmann (79177)
Publication venue
Publication date
Field of study

We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, et al. [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref035" target="_blank">35</a>] and compare our kWIP result (“kWIP”) with the results as presented by Edwards, et al. (“Weighted UniFrac” and “UniFrac”). We find that kWIP replicates their observations of stratification of root-associated microbiomes by rhizo-compartment (PC1) and experiment site (PC2).</p

FigShare

Overview of the weighted inner product metric as implemented in kWIP.

Author: Cheng Soon Ong (216284)
Christfried Webers (4413805)
Justin Borevitz (3384413)
Kevin D. Murray (759583)
Norman Warthmann (79177)
Publication venue
Publication date
Field of study

(A) k-mers are counted into sketches (using khmer [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref028" target="_blank">28</a>]). Columns represent the “bins” in each sketch. The frequencies of non-zero counts across a set of sketches is computed, forming the population frequency sketch (denoted F). We calculate Shannon entropy of this frequency sketch as the weight vector for the WIP metric (denoted H, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.e009" target="_blank">Eq 2</a>). (B) Illustration of Shannon Entropy as used in kWIP: the relationship between the population frequency (F) and the weight (H).</p

FigShare

'Agalma equivalent' alignments

Author: Adnan Moussalli (272589)
Frank Köhler (660792)
Kevin D. Murray (759583)
Luisa C. Teasdale (3311757)
Tim O'Hara (3322260)
Publication venue
Publication date: 24/05/2016
Field of study

The alignments representing a subset of the output of Agalma, run on 21 eupulmonate transcriptomes. This subset is the 635 orthologous clusters identified by the automated pipeline Agalma, which correspond to the 500 nuclear single copy, orthologous genes identified by manual curation. The alignments contain dummy sequences for missing taxa

Dryad Digital Repository (Duke University)

FigShare

'Agalma best' alignments

Author: Adnan Moussalli (272589)
Frank Köhler (660792)
Kevin D. Murray (759583)
Luisa C. Teasdale (3311757)
Tim O'Hara (3322260)
Publication venue
Publication date: 24/05/2016
Field of study

The alignments representing a subset of the output of Agalma, run on 21 eupulmonate transcriptomes. This subset is the 546 orthologous clusters identified by Agalma, where each orthologous cluster was the only one produced from the respective homolog cluster and had sequences for at least 18 taxa. The alignments contain dummy sequences for missing taxa

Dryad Digital Repository (Duke University)

FigShare

The effect of (A) mean sequencing depth (genome coverage) and (B) average number of nucleotide differences per site (π) on accuracy of genetic similarity estimates in simulations.

Author: Cheng Soon Ong (216284)
Christfried Webers (4413805)
Justin Borevitz (3384413)
Kevin D. Murray (759583)
Norman Warthmann (79177)
Publication venue
Publication date
Field of study

We plot mean ± standard deviation of Spearman’s ρ comparing each metric to known truth across 20 replicate runs. (A) Mean sequencing depth varies while average number of nucleotide differences per site (π) is constant at 0.005. kWIP: At low to moderate mean sequencing depth (<30x) weighting increases accuracy. The weighted metric (“WIP”) obtains near-optimal accuracy already at 10x and hence much earlier than the unweighted metric “IP”). There is no noticeable decrease in accuracy with increasing coverage. mash: regardless of error correction, mash performs less well than WIP. mash shows accuracy maxima at 4x coverage without (“Mash”) and at 16x coverage with abundance filter (“Mash (AF)”), at which point Mash (AF) performs almost as well as WIP. The accuracy of mash decreases dramatically when coverage is further increased. (B) Genome coverage is kept constant at 8x and average number of nucleotide differences per site (π) varies. While all metrics perform equally at a (π) of 1 in 100 (0.01), the performance of IP, Mash and Mash (AF) decreases rapidly as (π) between samples decreases. This does not occur for the weighted metric (WIP).</p

FigShare