19 research outputs found

    From scientific workflow patterns to 5-star linked open data

    Get PDF
    International audienceScientific Workflow management systems have been largely adopted by data-intensive science communities. Many efforts have been dedicated to the representation and exploitation of prove-nance to improve reproducibility in data-intensive sciences. However , few works address the mining of provenance graphs to annotate the produced data with domain-specific context for better interpretation and sharing of results. In this paper, we propose PoeM, a lightweight framework for mining provenance in scientific workflows. PoeM allows to produce linked in silico experiment reports based on workflow runs. PoeM leverages semantic web technologies and reference vocabularies (PROV-O, P-Plan) to generate provenance mining rules and finally assemble linked scientific experiment reports (Micropublications, Experimental Factor Ontology). Preliminary experiments demonstrate that PoeM enables the querying and sharing of Galaxy 1-processed genomic data as 5-star linked datasets

    MADGene: retrieval and processing of gene identifier lists for the analysis of heterogeneous microarray datasets

    Get PDF
    Summary: MADGene is a software environment comprising a web-based database and a java application. This platform aims at unifying gene identifiers (ids) and performing gene set analysis. MADGene allows the user to perform inter-conversion of clone and gene ids over a large range of nomenclatures relative to 17 species. We propose a set of 23 functions to facilitate the analysis of gene sets and we give two microarray applications to show how MADGene can be used to conduct meta-analyses

    Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc.</p> <p>Description</p> <p>We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (<it>Drosophila melanogaster, Caenorhabditis elegans</it>) and vertebrates (<it>Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus</it>). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies.</p> <p>Conclusions</p> <p>Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.</p

    From scientific workflow patterns to 5-star linked open data

    No full text
    International audienceScientific Workflow management systems have been largely adopted by data-intensive science communities. Many efforts have been dedicated to the representation and exploitation of prove-nance to improve reproducibility in data-intensive sciences. However , few works address the mining of provenance graphs to annotate the produced data with domain-specific context for better interpretation and sharing of results. In this paper, we propose PoeM, a lightweight framework for mining provenance in scientific workflows. PoeM allows to produce linked in silico experiment reports based on workflow runs. PoeM leverages semantic web technologies and reference vocabularies (PROV-O, P-Plan) to generate provenance mining rules and finally assemble linked scientific experiment reports (Micropublications, Experimental Factor Ontology). Preliminary experiments demonstrate that PoeM enables the querying and sharing of Galaxy 1-processed genomic data as 5-star linked datasets

    MAGNETO: An Automated Workflow for Genome-Resolved Metagenomics

    No full text
    International audienceGenome-resolved metagenomics has led to the discovery of previously untapped biodiversity within the microbial world. As the development of computational methods for the recovery of genomes from metagenomes continues, existing strategies need to be evaluated and compared to eventually lead to standardized computational workflows

    XML4NGS : A XML-based description of a Next-Generation sequencing project allowing the generation of a ’Makefile’-driven workflow.

    No full text
    <p>[in french] Poster presented at JOBIM2013 https://colloque.inra.fr/jobim2013/layout/set/print/Soumission2/Liste-des-soumissions-retenues-pour-une-presentation-sous-forme-d-affiche</p> <p>XML4NGS is a schema describing a NGS experiment in XML. It provides a XSLT<br>stylesheet transforming the XML into a Makefile-driven workflow allowing a parallel analysis<br>(alignment, calling, annotation ... ) on a cluster.</p> <p> </p> <p> </p

    M@IA: a modular open-source application for microarray workflow and integrative datamining.

    No full text
    International audienceMicroarray technology is a widely used approach to gene expression analysis. Many tools for microarray management and data analysis have been developed, and recently new methods have been proposed for deciphering biological pathways by integrating microarray data with other data sources. However, to improve microarray analysis and provide meaningful gene interaction networks, integrated software solutions are still needed. Therefore, we developed M@IA, an environment for DNA microarray data analysis allowing gene network reconstruction. M@IA is a microarray integrated application which includes all of the steps of a microarray study, from MIAME-compliant raw data storage and processing gene expression analysis. Furthermore, M@IA allows automatic gene annotation based on ontology, metabolic/signalling pathways, protein interaction, miRNA and transcriptional factor associations, as well as integrative analysis of gene interaction networks. Statistical and graphical methods facilitate analysis, yielding new hypotheses on gene expression data. To illustrate our approach, we applied M@IA modules to microarray data taken from an experiment on liver tissue. We integrated differentially expressed genes with additional biological information, thus identifying new molecular interaction networks that are associated with fibrogenesis. M@IA is a new application for microarray management and data analysis, offering functional insights into microarray data by the combination of gene expression data and biological knowledge annotation based on interactive graphs. M@IA is an interactive multi-user interface based on a flexible modular architecture and it is freely available for academic users at http://maia.genouest.org

    A dynamic, web-accessible resource to process raw microarray scan data into consolidated gene expression values: importance of replication

    No full text
    We propose a freely accessible web-based pipeline, which processes raw microarray scan data to obtain experimentally consolidated gene expression values. The tool MADSCAN, which stands for MicroArray Data Suites of Computed ANalysis, makes a practical choice among the numerous methods available for filtering, normalizing and scaling of raw microarray expression data in a dynamic and automatic way. Different statistical methods have been adapted to extract reliable information from replicate gene spots as well as from replicate microarrays for each biological situation under study. A carefully constructed experimental design thus allows to detect outlying expression values and to identify statistically significant expression values, together with a list of quality controls with proposed threshold values. The integrated processing procedure described here, based on multiple measurements per gene, is decisive for reliably monitoring subtle gene expression changes typical for most biological events
    corecore