5 research outputs found

    Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics

    Get PDF
    Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites. google.com/site/gaussianbhc

    Detailed Molecular and Immune Marker Profiling of Archival Prostate Cancer Samples Reveals an Inverse Association between TMPRSS2:ERG Fusion Status and Immune Cell Infiltration

    Get PDF
    Prostate cancer is a significant global health issue and limitations to current patient management pathways often result in over- or under-treatment. New ways to stratify patients are urgently needed. We conducted a feasibility study of such novel assessments looking for associations between genomic changes and lymphocyte infiltration. An innovative workflow utilizing an in-house targeted sequencing panel, immune cell profiling using an image analysis pipeline, RNA-Seq, and exome sequencing in select cases was tested. Gene fusions were profiled by RNA-seq in 27/27 cases and a significantly higher TIL count was noted in tumors without a TMPRSS2:ERG fusion compared to those with the fusion (P = 0.01). Although this finding was not replicated in a larger validation set (n=436) of The Cancer Genome Atlas images, there was a trend in the same direction. Differential expression analysis of TIL-High and TIL-Low tumors revealed the enrichment of both innate and adaptive immune response pathways. Mutations in mismatch repair genes (MLH1 and MSH6 mutations in 1/27 cases) were identified. We describe a potential immune escape mechanism in TMPRSS2:ERG fusion positive tumors. Detailed profiling, as shown here, can provide novel insights into tumor biology. Likely differences with findings with other cohorts are related to methods used to define region of interest, but this warrants further study in a larger cohort

    Why rankings of biomedical image analysis competitions should be interpreted with care

    Get PDF
    International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future
    corecore