20 research outputs found

    The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details.

    Get PDF
    The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org

    modMine: flexible access to modENCODE data.

    Get PDF
    In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine

    The diversity and evolution of pollination systems in large plant clades: Apocynaceae as a case study

    Get PDF
    Background and Aims Large clades of angiosperms are often characterized by diverse interactions with pollinators, but how these pollination systems are structured phylogenetically and biogeographically is still uncertain for most families. Apocynaceae is a clade of >5300 species with a worldwide distribution. A database representing >10 % of species in the family was used to explore the diversity of pollinators and evolutionary shifts in pollination systems across major clades and regions. Methods The database was compiled from published and unpublished reports. Plants were categorized into broad pollination systems and then subdivided to include bimodal systems. These were mapped against the five major divisions of the family, and against the smaller clades. Finally, pollination systems were mapped onto a phylogenetic reconstruction that included those species for which sequence data are available, and transition rates between pollination systems were calculated. Key Results Most Apocynaceae are insect pollinated with few records of bird pollination. Almost three-quarters of species are pollinated by a single higher taxon (e.g. flies or moths); 7 % have bimodal pollination systems, whilst the remaining approx. 20 % are insect generalists. The less phenotypically specialized flowers of the Rauvolfioids are pollinated by a more restricted set of pollinators than are more complex flowers within the Apocynoids + Periplocoideae + Secamonoideae + Asclepiadoideae (APSA) clade. Certain combinations of bimodal pollination systems are more common than others. Some pollination systems are missing from particular regions, whilst others are over-represented. Conclusions Within Apocynaceae, interactions with pollinators are highly structured both phylogenetically and biogeographically. Variation in transition rates between pollination systems suggest constraints on their evolution, whereas regional differences point to environmental effects such as filtering of certain pollinators from habitats. This is the most extensive analysis of its type so far attempted and gives important insights into the diversity and evolution of pollination systems in large clades

    Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE

    No full text
    Abstract Background Funded by the National Institutes of Health (NIH), the aim of the Mod el Organism ENC yclopedia o f D NA E lements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition. Results In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy ( https://github.com/modENCODE-DCC/Galaxy ), on the public Amazon Cloud ( http://aws.amazon.com ), and on the private Bionimbus Cloud for genomic research ( http://www.bionimbus.org ). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies. Conclusions Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around
    corecore