132 research outputs found

    FactoMineR: An R Package for Multivariate Analysis

    Get PDF
    In this article, we present FactoMineR an R package dedicated to multivariate data analysis. The main features of this package is the possibility to take into account different types of variables (quantitative or categorical), different types of structure on the data (a partition on the variables, a hierarchy on the variables, a partition on the individuals) and finally supplementary information (supplementary individuals and variables). Moreover, the dimensions issued from the different exploratory data analyses can be automatically described by quantitative and/or categorical variables. Numerous graphics are also available with various options. Finally, a graphical user interface is implemented within the Rcmdr environment in order to propose an user friendly package.

    A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data

    Get PDF
    BACKGROUND: Gene clustering algorithms are massively used by biologists when analysing omics data. Classical gene clustering strategies are based on the use of expression data only, directly as in Heatmaps, or indirectly as in clustering based on coexpression networks for instance. However, the classical strategies may not be sufficient to bring out all potential relationships amongst genes. RESULTS: We propose a new unsupervised gene clustering algorithm based on the integration of external biological knowledge, such as Gene Ontology annotations, into expression data. We introduce a new distance between genes which consists in integrating biological knowledge into the analysis of expression data. Therefore, two genes are close if they have both similar expression profiles and similar functional profiles at once. Then a classical algorithm (e.g. K-means) is used to obtain gene clusters. In addition, we propose an automatic evaluation procedure of gene clusters. This procedure is based on two indicators which measure the global coexpression and biological homogeneity of gene clusters. They are associated with hypothesis testing which allows to complement each indicator with a p-value. Our clustering algorithm is compared to the Heatmap clustering and the clustering based on gene coexpression network, both on simulated and real data. In both cases, it outperforms the other methodologies as it provides the highest proportion of significantly coexpressed and biologically homogeneous gene clusters, which are good candidates for interpretation. CONCLUSION: Our new clustering algorithm provides a higher proportion of good candidates for interpretation. Therefore, we expect the interpretation of these clusters to help biologists to formulate new hypothesis on the relationships amongst genes

    integrOmics: an R package to unravel relationships between two omics datasets

    Get PDF
    Motivation: With the availability of many ‘omics’ data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated

    Selection of biologically relevant genes with a wrapper stochastic algorithm

    Get PDF
    International audienceWe investigate an important issue of a meta-algorithm for selecting variables in the framework of microarray data. This wrapper method starts from any classification algorithm and weights each variable (i.e. gene) relative to its efficiency for classification. An optimization procedure is then inferred which exhibits important genes for the studied biological process. Theory and application with the SVM classifier were presented in Gadat and Younes, 2007 and we extend this method with CART. The classification error rates are computed on three famous public databases (Leukemia, Colon and Prostate) and compared with those from other wrapper methods (RFE, lo norm SVM, Random Forests). This allows the assessment of the statistical relevance of the proposed algorithm. Furthermore, a biological interpretation with the Ingenuity Pathway Analysis software outputs clearly shows that the gene selections from the different wrapper methods raise very relevant biological information, compared to a classical filter gene selection with T-test

    Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge. It is thus of great importance to provide useful integrative approaches dedicated to ease the interpretation of microarray data.</p> <p>Results</p> <p>Here, we introduce a data-mining approach, Multiple Factor Analysis (MFA), to combine multiple data sets and to add formalized knowledge. MFA is used to jointly analyse the structure emerging from genomic and transcriptomic data sets. The common structures are underlined and graphical outputs are provided such that biological meaning becomes easily retrievable. Gene Ontology terms are used to build gene modules that are superimposed on the experimentally interpreted plots. Functional interpretations are then supported by a step-by-step sequence of graphical representations.</p> <p>Conclusion</p> <p>When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings. Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.</p

    Hardware-in-the-Loop Platform for Performance Evaluation of Energy Production, Storage and Distribution Systems for Buildings

    Full text link
    peer reviewedThe current study carried out within the framework of the PEPSE (Semi-virtual Platform for performance Evaluation of Energy Production, Storage and distribution systems for buildings) project aims at designing, developing and setting up the infrastructure and the equipment of a laboratory for evaluating the energy performance of heating and cooling production, storage and distribution systems in buildings. The platform is semi-virtual, i.e. energy sources and loads can be real or simulated. The virtual environment (which is a numerical program for simulating energy sources and loads) controls the inlet conditions and the operation of tested device by means of one or several satellite units (i.e. physical interfaces) located on the distribution and return lines of the hydraulic/air-flow loops. The equipment outlet conditions are also sent back towards the simulation program by these interfaces. The latter are supplied with hot and cold water by two energy production and distribution systems. The maximum power of devices to be tested could be up to 200 kW (heating or cooling). This capacity allows the laboratory to test a relatively wide range of devices from the heating or cooling appliances of a single-family house to the energy equipment/system of multi-family residential building or the equipment for district heating system, etc.PEPSE (Poste d’Essai « semi-virtuel » pour le test de systĂšmes de Production, de Stockage et de distribution d’Energie

    Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes (Open Access publication)

    Get PDF
    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two different mastitis causing bacteria: Escherichia coli and Staphylococcus aureus. It was reassuring to see that most of the teams found the same main biological results. In fact, most of the differentially expressed genes were found for infection by E. coli between uninfected and 24 h challenged udder quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised a biological problem of cross-talk between infected and uninfected quarters which will have to be dealt with for further microarray studies

    Analyzing sensory data with jamovi and R

    No full text
    • 

    corecore