Search CORE

5 research outputs found

Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE

Author: Ellen T Kephart
Fei-Yang Jen
Kar Chu
Lincoln D Stein
Marc D Perry
Peter Ruzanov
Quang M Trinh
Sergio Contrino
Ziru Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

modMine: flexible access to modENCODE data.

Author: Butano Daniela
Carbon Seth
Carr Adrian
Contrino Sergio
Hu Fengyuan
Kalderimis Alex
Kephart Ellen T
Lewis Suzanna E
Lloyd Paul
Lyne Rachel
Micklem Gos
Perry Marc D
Rutherford Kim
Ruzanov Peter
Smith Richard N
Stein Lincoln D
Stinson EO
Sullivan Julie
Washington Nicole L
Zha Zheng
Publication venue: Nucleic Acids Res
Publication date: 12/11/2011
Field of study

In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine

Crossref

PubMed Central

Apollo (Cambridge)

Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE

Author: Chu Kar M
Contrino Sergio
Jen Fei-Yang A
Kephart Ellen T
Perry Marc D
Ruzanov Peter
Stein Lincoln D
Trinh Quang M
Zhou Ziru
Publication venue
Publication date: 26/03/2018
Field of study

Abstract Background Funded by the National Institutes of Health (NIH), the aim of the Mod el Organism ENC yclopedia o f D NA E lements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition. Results In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy ( https://github.com/modENCODE-DCC/Galaxy ), on the public Amazon Cloud ( http://aws.amazon.com ), and on the private Bionimbus Cloud for genomic research ( http://www.bionimbus.org ). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies. Conclusions Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around

University of Toronto Research Repository

Recommended from our members

The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

Author: Beale Holly C
Bjork Isabel
Cattle Matthew A
Currie Rob
Haussler David
Kephart Ellen T
Lam Du Linh
Learned Katrina
Lyle A Geoffrey
McKay Liam T
Pfeil Jacob
Roger Jacquelyn M
Salama Sofie R
Sanders Lauren
Thompson Drew KA
Vaske Olena M
Vivian John
Publication venue: eScholarship, University of California
Publication date: 01/03/2021
Field of study

BackgroundThe reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis.FindingsIn bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (median [IQR], 3% [3-6%]); duplicate reads constitute 3-100% of mapped reads (median [IQR], 27% [13-43%]); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (median [IQR], 25% [16-37%]). MEND reads constitute 0-79% of total reads (median [IQR], 50% [30-61%]).ConclusionsBecause not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth

eScholarship - University of California