101 research outputs found

    integrOmics: an R package to unravel relationships between two omics datasets

    Get PDF
    Motivation: With the availability of many ‘omics’ data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated

    Selection of biologically relevant genes with a wrapper stochastic algorithm

    Get PDF
    International audienceWe investigate an important issue of a meta-algorithm for selecting variables in the framework of microarray data. This wrapper method starts from any classification algorithm and weights each variable (i.e. gene) relative to its efficiency for classification. An optimization procedure is then inferred which exhibits important genes for the studied biological process. Theory and application with the SVM classifier were presented in Gadat and Younes, 2007 and we extend this method with CART. The classification error rates are computed on three famous public databases (Leukemia, Colon and Prostate) and compared with those from other wrapper methods (RFE, lo norm SVM, Random Forests). This allows the assessment of the statistical relevance of the proposed algorithm. Furthermore, a biological interpretation with the Ingenuity Pathway Analysis software outputs clearly shows that the gene selections from the different wrapper methods raise very relevant biological information, compared to a classical filter gene selection with T-test

    Model-based joint visualization of multiple compositional omics datasets

    Get PDF
    The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi

    A sparse PLS for variable selection when integrating omics data

    Get PDF
    Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called "sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets. Copyright ©2008 The Berkeley Electronic Press. All rights reserved

    Integrative mixture of experts to combine clinical factors and gene markers

    Get PDF
    Motivation: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature

    Anthropology on Economic Development in Hanoi, Capital of Vietnam Analysis of Commercial Activities of Hanghom Paint Shops Street

    Get PDF
    The killer immunoglobulin-like receptors (KIRs), found predominantly on the surface of natural killer (NK) cells and some T-cells, are a collection of highly polymorphic activating and inhibitory receptors with variable specificity for class I human leukocyte antigen (HLA) ligands. Fifteen KIR genes are inherited in haplotypes of diverse gene content across the human population, and the repertoire of independently inherited KIR and HLA alleles is known to alter risk for immune-mediated and infectious disease by shifting the threshold of lymphocyte activation. We have conducted the largest disease-association study of KIR-HLA epistasis to date, enabled by the imputation of KIR gene and HLA allele dosages from genotype data for 12,214 healthy controls and 8,107 individuals with the HLA-B*27-associated immune-mediated arthritis, ankylosing spondylitis (AS). We identified epistatic interactions between KIR genes and their ligands (at both HLA subtype and allele resolution) that increase risk of disease, replicating analyses in a semi-independent cohort of 3,497 cases and 14,844 controls. We further confirmed that the strong AS-association with a pathogenic variant in the endoplasmic reticulum aminopeptidase gene ERAP1, known to alter the HLA-B*27 presented peptidome, is not modified by carriage of the canonical HLA-B receptor KIR3DL1/S1. Overall, our data suggests that AS risk is modified by the complement of KIRs and HLA ligands inherited, beyond the influence of HLA-B*27 alone, which collectively alter the proinflammatory capacity of KIR-expressing lymphocytes to contribute to disease immunopathogenesis

    Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

    Get PDF
    Background: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.Results: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.Conclusions: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets

    Multiparameter analysis of naevi and primary melanomas identifies a subset of naevi with elevated markers of transformation

    Get PDF
    Here we have carried out a multiparameter analysis using a panel of 28 immunohistochemical markers to identify markers of transformation from benign and dysplastic naevus to primary melanoma in three separate cohorts totalling 279 lesions. We have identified a set of eight markers that distinguish naevi from melanoma. None of markers or parameters assessed differentiated benign from dysplastic naevi. Indeed, the naevi clustered tightly in terms of their immunostaining patterns whereas primary melanomas showed more diverse staining patterns. A small subset of histopathologically benign lesions had elevated levels of multiple markers associated with melanoma, suggesting that these represent naevi with an increased potential for transformation to melanoma

    The EADGENE Microarray Data Analysis Workshop (Open Access publication)

    Get PDF
    Microarray analyses have become an important tool in animal genomics. While their use is becoming widespread, there is still a lot of ongoing research regarding the analysis of microarray data. In the context of a European Network of Excellence, 31 researchers representing 14 research groups from 10 countries performed and discussed the statistical analyses of real and simulated 2-colour microarray data that were distributed among participants. The real data consisted of 48 microarrays from a disease challenge experiment in dairy cattle, while the simulated data consisted of 10 microarrays from a direct comparison of two treatments (dye-balanced). While there was broader agreement with regards to methods of microarray normalisation and significance testing, there were major differences with regards to quality control. The quality control approaches varied from none, through using statistical weights, to omitting a large number of spots or omitting entire slides. Surprisingly, these very different approaches gave quite similar results when applied to the simulated data, although not all participating groups analysed both real and simulated data. The workshop was very successful in facilitating interaction between scientists with a diverse background but a common interest in microarray analyses
    • 

    corecore