27 research outputs found

    w4s Tech Wanted

    No full text
    <p>An example of a blog post with a DOI</p

    What does Trinity's In Silico normalization do?

    No full text
    A discussion of the Trinity mRNAseq in silico normalization algorith

    Using the concept of informative genomic segment to investigate microbial diversity of metagenomics sample

    No full text
    <p>In almost all the metagenomics projects, diversity analysis plays an important role to supply information about the richness of species, the species abundance distribution in a sample or the similarity and difference between different samples, all of which are crucial to draw insightful and reliable conclusion. Traditionally OTUs(Operational Taxonomic Units) are used as the cornerstone for diversity analysis. Here we propose a novel concept - IGS (informative genomic segment) and use IGS as a replacement of OTUs to be the cornerstone for diversity analysis of whole shotgun metagenomics data sets. IGSs represent the unique information in a metagenomics data set and the abundance of IGSs in different samples can be retrieved by the reads coverage through an efficient k-mer counting method. This samples-by-<br>IGS abundance data matrix is a promising replacement of samples-by-OTU data matrix used in 16S rRNA based analysis and all existing statistical methods can be borrowed to work on the samples-by-IGS data matrix to investigate the diversity. We applied the IGS-based method to Global Ocean Sampling Expedition (GOS) dataset and the samples were clustered more accurately than existing alignment-based method. We also tried this novel method to MetaHIT data sets. Since this method is<br>totally binning-free, assembly-free, annotation-free, reference-free, it is specifically promising to deal with the highly diverse samples, while we are facing large amount of ?dark matters? in it, like soil.</p> <p> </p

    Channeling community contributions to scientific software: a sprint experience

    No full text
    <p>Our submission for the 2nd Workshop on Sustainable Software for Science: Practice and Experiences</p

    Developing the informationscape approach to environmental change detection

    No full text
    <p>Dr. Laurel G. Larsen's Moore Data Driven Discovery proposal; note, I am not an author.</p

    Marine Microbial Eukaryotic Transcriptome Sequencing Project, re-assemblies

    No full text
    <div>The Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP) data set contains cultured samples of pelagic and endosymbiotic marine eukaryotic species representing more than 40 phyla (Keeling et al. 2014).</div><div><br></div><div>Methods for the de novo transcriptome assembly are described in the Eel pond khmer protocols (Brown et al. 2015).</div><div><br></div><div>Scripts available on github: </div><div><br></div><div>https://github.com/dib-lab/dib-MMETSP<br></div><div><br></div><div>C. Titus Brown, Camille Scott, and Leigh Sheneman. 2015. The Eel Pond mRNAseq Protocol. https://khmer-protocols.readthedocs.io/en/ctb/mrnaseq/</div><div><br></div><div>Keeling et al. 2014. The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in the Oceans through Transcriptome Sequencing. <a href="http://dx.doi.org/10.1371/journal.pbio.1001889">http://dx.doi.org/10.1371/journal.pbio.1001889</a></div><br

    Benchmark soil metagenome data sets for k-mer counting performance, taken from [11].

    No full text
    <p>Benchmark soil metagenome data sets for k-mer counting performance, taken from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone.0101271-Howe1" target="_blank">[11]</a>.</p

    Low-memory digital normalization.

    No full text
    <p><b>The results of digitally normalizing a 5 m read </b><b><i>E. coli</i></b><b> data set (1.4 GB) to C = 20 with k = 20 under several memory usage/false positive rates. The false positive rate (column 1) is empirically determined. We measured reads remaining, number of “true” k-mers missing from the data at each step, and the number of total k-mers remaining. Note: at high false positive rates, reads are erroneously removed due to inflation of k-mer counts.</b></p

    Iterative low-memory k-mer trimming.

    No full text
    <p><b>The results of trimming reads at unique (erroneous) k-mers from a 5 m read </b><b><i>E. coli</i></b><b> data set (1.4 GB) in under 30 MB of RAM. After each iteration, we measured the total number of distinct k-mers in the data set, the total number of unique (and likely erroneous) k-mers remaining, and the number of unique k-mers present at the 3' end of reads.</b></p
    corecore