20 research outputs found

    Probabilistic principal component analysis for metabolomic data

    Get PDF
    Background: Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. Results: Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data. Conclusions: The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.Irish Research Council for Science, Engineering and TechnologyHealth Research Boar

    The consensus molecular subtypes of colorectal cancer

    Get PDF
    Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-beta activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions

    The Consensus Molecular Subtypes of Colorectal Cancer

    Get PDF
    Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use -- https://www.nature.com/authors/policies/license.html#termsColorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMS) with distinguishing features: CMS1 (MSI Immune, 14%), hypermutated, microsatellite unstable, strong immune activation; CMS2 (Canonical, 37%), epithelial, chromosomally unstable, marked WNT and MYC signaling activation; CMS3 (Metabolic, 13%), epithelial, evident metabolic dysregulation; and CMS4 (Mesenchymal, 23%), prominent transforming growth factor β activation, stromal invasion, and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intra-tumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC - with clear biological interpretability - and the basis for future clinical stratification and subtype-based targeted interventions

    polyClustR: defining communities of reconciled cancer subtypes with biological and prognostic significance

    No full text
    Abstract Background To ensure cancer patients are stratified towards treatments that are optimally beneficial, it is a priority to define robust molecular subtypes using clustering methods applied to high-dimensional biological data. If each of these methods produces different numbers of clusters for the same data, it is difficult to achieve an optimal solution. Here, we introduce “polyClustR”, a tool that reconciles clusters identified by different methods into subtype “communities” using a hypergeometric test or a measure of relative proportion of common samples. Results The polyClustR pipeline was initially tested using a breast cancer dataset to demonstrate how results are compatible with and add to the understanding of this well-characterised cancer. Two uveal melanoma datasets were then utilised to identify and validate novel subtype communities with significant metastasis-free prognostic differences and associations with known chromosomal aberrations. Conclusion We demonstrate the value of the polyClustR approach of applying multiple consensus clustering algorithms and systematically reconciling the results in identifying novel subtype communities of two cancer types, which nevertheless are compatible with established understanding of these diseases. An R implementation of the pipeline is available at: https://github.com/syspremed/polyClust

    Product partition latent variable model for multiple change-point detection in multivariate data

    No full text
    <div><p>The product partition model (PPM) is a well-established efficient statistical method for detecting multiple change points in time-evolving univariate data. In this article, we refine the PPM for the purpose of detecting multiple change points in correlated multivariate time-evolving data. Our model detects distributional changes in both the mean and covariance structures of multivariate Gaussian data by exploiting a smaller dimensional representation of correlated multiple time series. The utility of the proposed method is demonstrated through experiments on simulated and real datasets.</p></div

    Legislative Documents

    No full text
    Also, variously referred to as: House bills; House documents; House legislative documents; legislative documents; General Court documents

    Additional file 3: of polyClustR: defining communities of reconciled cancer subtypes with biological and prognostic significance

    No full text
    Figure S3. Comparison of community classifications from each reconciliation method with intrinsic breast cancer subtypes. (A-B) Heatmap showing hypergeometric test with overlap between the subtype communities (from polyClustR) and the known subtypes from A) hypergeometric and B) PMI reconciliation methods. Norm – normal-like subtype; LumA – luminal A subype; Lum B – luminal B subtype. (PDF 50 kb

    Additional file 2: of polyClustR: defining communities of reconciled cancer subtypes with biological and prognostic significance

    No full text
    Figure S2. Gene Set Enrichment Analysis (GSEA) analysis between subtypes in breast cancer. (A-F) GSEA analysis between the A) bNMF2 and B) bNMF6 breast cancer clusters, showing gene enrichment of metaplastic breast cancer and 17q21–25 amplicon signatures and C-F) between the two basal-subtype (bKM1 and bKM4) k-means clusters, showing enrichment of invasive and immune-related gene sets in bKM4 cluster. (PDF 119 kb

    Additional file 4: of polyClustR: defining communities of reconciled cancer subtypes with biological and prognostic significance

    No full text
    Figure S4. Silhouette width of each sample and community in breast cancer for each reconciliation method –hypergeometric (HYP; left) and PMI (right). Colors represent distinct subtype communities. (PDF 21 kb
    corecore