20 research outputs found

    MADGene: retrieval and processing of gene identifier lists for the analysis of heterogeneous microarray datasets

    Get PDF
    Summary: MADGene is a software environment comprising a web-based database and a java application. This platform aims at unifying gene identifiers (ids) and performing gene set analysis. MADGene allows the user to perform inter-conversion of clone and gene ids over a large range of nomenclatures relative to 17 species. We propose a set of 23 functions to facilitate the analysis of gene sets and we give two microarray applications to show how MADGene can be used to conduct meta-analyses

    Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc.</p> <p>Description</p> <p>We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (<it>Drosophila melanogaster, Caenorhabditis elegans</it>) and vertebrates (<it>Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus</it>). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies.</p> <p>Conclusions</p> <p>Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.</p

    Une méthode implicative pour l'analyse de données d'expression de gÚnes

    No full text

    Application de techniques de fouille de données en Bio-informatique

    Get PDF
    Les travaux de recherche prĂ©sentĂ©s par l'auteur ont pour objet l'application de techniques d'extraction de connaissances Ă  partir de donnĂ©es (ECD) en biologie. Deux thĂšmes majeurs de recherche en bio-informatique sont abordĂ©s : la recherche d'homologues distants dans des familles de protĂ©ines et l'analyse du transcriptome. La recherche d'homologues distants Ă  partir de sĂ©quences protĂ©iques est une problĂ©matique qui consiste Ă  dĂ©couvrir de nouveaux membres d'une famille de protĂ©ines. Celle-ci partageant gĂ©nĂ©ralement une fonction biologique, l'identification de la famille permet d'investiguer le rĂŽle d'une sĂ©quence protĂ©ique. Des classifieurs ont Ă©tĂ© dĂ©veloppĂ©s pour discriminer une superfamille de protĂ©ines particuliĂšre, celle des cytokines. Ces protĂ©ines sont impliquĂ©es dans le systĂšme immunitaire et leur Ă©tude est d'une importance cruciale en thĂ©rapeutique. La technique des SĂ©parateurs Ă  Vastes Marges (SVM) a Ă©tĂ© retenue, cette technique ayant donnĂ© les rĂ©sultats les plus prometteurs pour ce type d'application. Une mĂ©thode originale de classification a Ă©tĂ© conçue, basĂ©e sur une Ă©tape prĂ©liminaire de dĂ©couverte de mots sur-reprĂ©sentĂ©s dans la famille d'intĂ©rĂȘt. L'apport de cette dĂ©marche est d'utiliser un dictionnaire retreint de motifs discriminants, par rapport Ă  des techniques utilisant un espace global de k-mots. Une comparaison avec ces derniĂšres mĂ©thodes montre la pertinence de cette approche en termes de performances de classification. La seconde contribution pour cette thĂ©matique porte sur l'agrĂ©gation des classifieurs basĂ©e sur des essaims grammaticaux. Cette mĂ©thode vise Ă  optimiser l'association de classifieurs selon des modĂšles de comportement sociaux, Ă  la maniĂšre des algorithmes gĂ©nĂ©tiques d'optimisation. Le deuxiĂšme axe de recherche traite de l'analyse des donnĂ©es du transcriptome. L'Ă©tude du transcriptome reprĂ©sente un enjeu considĂ©rable, tant du point de vue de la comprĂ©hension des mĂ©canismes du vivant que des applications cliniques et pharmacologiques. L'analyse implicative sur des rĂšgles d'association, dĂ©veloppĂ©e initialement par RĂ©gis Gras, a Ă©tĂ© appliquĂ©e aux donnĂ©es du transcriptome. Une approche originale basĂ©e sur des rangs d'observation a Ă©tĂ© proposĂ©e. Deux applications illustrent la pertinence de cette mĂ©thode : la sĂ©lection de gĂšnes informatifs et la classification de tumeurs. Enfin, une collaboration Ă©troite avec une Ă©quipe INSERM dirigĂ©e par RĂ©mi Houlgatte a conduit Ă  l'enrichissement d'une suite logicielle dĂ©diĂ©e aux donnĂ©es de puces Ă  ADN. Cette collection d'outils dĂ©nommĂ©e MADTOOLS a pour objectifs l'intĂ©gration de donnĂ©es du transcriptome et l'aide Ă  la mĂ©ta-analyse. Une application majeure de cette suite utilise les donnĂ©es publiques relatives aux pathologies musculaires. La mĂ©ta-analyse, en se basant sur des jeux de donnĂ©es indĂ©pendants, amĂ©liore grandement la robustesse des rĂ©sultats. L'Ă©tude systĂ©matique de ces donnĂ©es a mis en Ă©vidence des groupes de gĂšnes co-exprimĂ©s de façon rĂ©currente. Ces groupes conservent leur propriĂ©tĂ© discriminante au travers de jeux trĂšs divers en termes d'espĂšces, de maladies ou de conditions expĂ©rimentales. Cette Ă©tude peut Ă©videmment se gĂ©nĂ©raliser Ă  l'ensemble des donnĂ©es publiques concernant le transcriptome. Elle ouvre la voie Ă  une approche Ă  trĂšs grande Ă©chelle de ce type de donnĂ©es pour l'Ă©tude d'autres pathologies humaines

    Agrégation de classi eurs et d'experts pour la recherche d'homologues chez les cytokines à quatre hélices alpha

    No full text
    L'objectif de ce travail est la mise au point d'une méthode de détection d'homologues de cytokines inconnues. J'ai dans un premier temps évalué plusieurs classifieurs SVM. J'ai ensuite proposé d'ajouter, sous la forme d'experts automatiques, des connaissances spécifiques à la famille étudiée. Enfin, afin de maximiser l'efficacité de leur association, j'ai comparé différentes méthodes d'agrégation. Je propose une méthode performante, basée sur la combinaisons de ces classifieurs et de ces experts, généralisable à d'autres familles de protéines.I was working on a particular gene family : the four helix cytokines. The major purpose of this work was to design a new method to detected still unknown members in the human genome. The first part of my work was to compare SVM classifiers, which is known as the best strategy for homologs research, from the literature. During the second part of my work, i designed automatical experts which deals with information like biological features.The last part of my work consisted in evaluating methods to aggregate classifiers and experts. This strategy achieve better results than the best classifer alone and it can easily be adapted to other gene family.NANTES-BU Médecine pharmacie (441092101) / SudocSudocFranceF

    Differential study of the cytokine network in the immune system: An evolutionary approach based on the Bayesian networks

    Get PDF
    International audienceIn this paper, we present a Bayesian networks (BNs) approach in order to infer the di erentiation of the cytokine implication in di erent experimental conditions. We introduce an evolutionary method for BNs structure learning that maintains a set of the best learned networks. Each of them will be tested by a statistic test with two populations of patient data: one with treatment (drugs), other without treatment

    Reconstruction of gene regulation networks from microarray data by Bayesian networks

    Get PDF
    National audienceIn this work, we reconstruct the gene regulation networks from the microarray experiments data by Bayesian networks approach. We use the evolutionary algorithm for the search-and-score based structure learning methods. The learned network is tested by the hypothesis testing with two populations of patient data, one with treatment (drugs), other without treatment. The answer of question "How does the treatment influence to gene regulation?" is expected

    MADTools: management tool for the mining of microarray data

    No full text
    International audienceno abstrac

    FaDA: A web application for regular laboratory data analyses

    No full text
    International audienceWeb-based data analysis and visualization tools are mostly designed for specific purposes, such as the analysis of data from whole transcriptome RNA sequencing or single-cell RNA sequencing. However, generic tools designed for the analysis of common laboratory data for noncomputational scientists are also needed. The importance of such web-based tools is emphasized by the continuing increases in the sample capacity of conventional laboratory tools such as quantitative PCR, flow cytometry or ELISA instruments. We present a web-based application FaDA, developed with the R Shiny package that provides users with the ability to perform statistical group comparisons, including parametric and nonparametric tests, with multiple testing corrections suitable for most standard wet-laboratory analyses. FaDA provides data visualizations such as heatmaps, principal component analysis (PCA) plots, correlograms and receiver operating curves (ROCs). Calculations are performed through the R language. The FaDA application provides a free and intuitive interface that allows biologists without bioinformatic skill to easily and quickly perform common laboratory data analyses. The application is freely accessible at https://shiny-bird.univ-nantes.fr/app/Fada
    corecore