Search CORE

33 research outputs found

Cluster stability scores for microarray data in cancer studies

Author: Ghosh Debashis
Smolkin Mark
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from clustering procedures. While most work has focused on estimating the number of clusters in a dataset, the question of stability of individual-level clusters has not been addressed. RESULTS: We address this problem by developing cluster stability scores using subsampling techniques. These scores exploit the redundancy in biologically discriminatory information on the chip. Our approach is generic and can be used with any clustering method. We propose procedures for calculating cluster stability scores for situations involving both known and unknown numbers of clusters. We also develop cluster-size adjusted stability scores. The method is illustrated by application to data three cancer studies; one involving childhood cancers, the second involving B-cell lymphoma, and the final is from a malignant melanoma study. AVAILABILITY: Code implementing the proposed analytic method can be obtained at the second author's website

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Collection Of Biostatistics Research Archive

Deep Blue Documents at the University of Michigan

New resampling method for evaluating stability of clusters

Author: A Bhattacharjee
A Thalamuthu
B Efron
F Tschentscher
GC Tseng
H Pruscha
H Schneider
Irina M Gana Dresen
J Handl
J Quackenbush
JC Gower
JH Ward
Johannes Huesing
K Zhang
Karl-Heinz Joeckel
L Hubert
LM McShane
M Bittner
M Smolkin
Markus Neuhaeuser
MB Eisen
MK Kerr
PHA Sneath
RR Sokal
S Datta
S Datta
S Datta
S Dudoit
S Monti
T Margush
T Sørensen
Tanja Boes
WM Rand
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample. Results Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low. Conclusion We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Consensus clustering and functional interpretation of gene-expression data

Author: Kellam P.
Liu X.
Martin Nigel
Orengo C.A.
Swift S.
Tucker A.
Vinciotti V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas

Springer - Publisher Connector

UCL Discovery

PubMed Central

Birkbeck Institutional Research Online

Spiral - Imperial College Digital Repository

Brunel University Research Archive

Validation of Gene Expression Profiles in Genomic Data through Complementary Use of Cluster Analysis and PCA-Related Biplots

Author: Ambrogi Federico
Bassani Niccolò
Biganzoli Elia
Boracchi Patrizia
Coradini Danila
Publication venue: 'Lifescience Global'
Publication date: 20/12/2012
Field of study

High-throughput genomic assays are used in molecular biology to explore patterns of joint expression of thousands of genes. These methodologies had relevant developments in the last decade, and concurrently there was a need for appropriate methods for analyzing the massive data generated. Identifying sets of genes and samples characterized by similar values of expression and validating these results are two critical issues related to these investigations because of their clinical implication. From a statistical perspective, unsupervised class discovery methods like Cluster Analysis are generally adopted. However, the use of Cluster Analysis mainly relies on the use of hierarchical techniques without considering possible use of other methods. This is partially due to software availability and to easiness of representation of results through a heatmap, which allows to simultaneously visualize clusterization of genes and samples on the same graphical device. One drawback of this strategy is that clusters' stability is often neglected, thus leading to over-interpretation of results. Moreover, validation of results using external datasets is still subject of discussion, since it is well known that batch effects may condition gene expression results even after normalization. In this paper we compared several clustering algorithms (hierarchical, k-means, model-based, Affinity Propagation) and stability indices to discover common patterns of expression and to assess clustering reliability, and propose a rank-based passive projection of Principal Components for validation purposes. Results from a study involving 23 tumor cell lines and 76 genes related to a specific biological pathway and derived from a publicly available dataset, are presented

Publication Management System

Genomic approaches in the management and treatment of breast cancer

Author: A Adeyinka
AM Snijders
B Fisher
CL Carter
CM Perou
CM Perou
D Porter
DC Sgroi
ER Fisher
ER Fisher
GM Clark
GM Clark
I Hedenfalk
J C Chang
J Chang
JC Chang
JR Pollack
K Aoyagi
LC Verhoog
LJ van't Veer
LJ van't Veer
LM McShane
M Ayers
M Ellis
MJ van de Vijver
P O'Connell
P O'Connell
R Simon
S A W Fuqua
S G Hilsenbeck
T Sorlie
TA Buchholz
WF Symmans
WL McGuire
Publication venue: Nature Publishing Group
Publication date
Field of study

Breast cancer is the most common malignancy afflicting women from Western cultures. It has been estimated that approximately 211 000 women will be diagnosed with breast cancer in 2003 in the United States alone, and each year over 40 000 women will die of this disease. Developments in breast cancer molecular and cellular biology research have brought us closer to understanding the genetic basis of this disease. Unfortunately, this information has not yet been incorporated into the routine diagnosis and treatment of breast cancer in the clinic. Recent advancements in microarray technology hold the promise of further increasing our understanding of the complexity and heterogeneity of this disease, and providing new avenues for the prognostication and prediction of breast cancer outcomes. The most recent application of microarray genomic technologies to studying breast cancer will be the focus of this review

Crossref

PubMed Central

Microarray-Based Class Discovery for Molecular Classification of Breast Cancer: Analysis of Interobserver Agreement

Author: Alan Ashworth
Alan Mackay
Alizadeh
Anita Grigoriadis
Bas Kreike
Bertucci
Bertucci
Bittner
Brennan
Britta Weigelt
Carey
David S.P. Tan
Desmedt
Desmedt
Dobbin
Dupuy
Ein-Dor
Eisen
Garber
Gusterson
Haibe-Kains
He
Hu
Iorns
Jorge S. Reis-Filho
Khan
Kreike
Landis
Liedtke
Liu
Livasy
Lusa
McShane
Michiels
Mitch Dowsett
Nielsen
Paik
Parker
Parker
Peppercorn
Perou
Popovici
Pusztai
Rachael Natrajan
Rakha
Randolph
Roger A’Hern
Rouzier
Smid
Sorlie
Sorlie
Sotiriou
Stephens
Suzuki
Teschendorff
Turashvili
Turbin
van de Vijver
Van Laere
Wang
Wasielewski
Weigelt
Weigelt
Weigelt
Weigelt
Wirapati
Publication venue: Oxford University Press
Publication date: 01/04/2011
Field of study

Background Breast cancers can be classified by hierarchical clustering using an "intrinsic" gene list into one of at least five molecular subtypes: basal-like, HER2, luminal A, luminal B, and normal breast-like. Five different intrinsic gene lists composed of varying numbers of genes have been used for molecular subtype identification and classification of breast cancers. The aim of this study was to determine the objectivity and interobserver reproducibility of the assignment of molecular subtype classes by hierarchical cluster analysis. Methods Three publicly available breast cancer datasets (n = 779) were subjected to two-way average-linkage hierarchical cluster analysis using five distinct intrinsic gene lists. We used free-marginal Kappa statistics to analyze interobserver agreement among five breast cancer researchers for the whole classification and for each molecular subtype separately according to each intrinsic gene list for each breast cancer dataset. Results None of the classification systems tested produced almost perfect agreement (Kappa >= 0.81) among observers. However, substantial interobserver agreement (70.8% to 76.1% of the samples and free-marginal Kappa scores from 0.635 to 0.701) was consistently observed in all datasets for four molecular subtypes (luminal, basal-like, HER2, and normal breast-like). When luminal cancers were subdivided (luminal A, B, and C), none of the classification systems produced substantial agreement (Kappa >= 0.61) in all the datasets analyzed. Analysis of each subtype separately revealed that only two (basal-like and HER2) could be reproducibly identified by independent observers (Kappa >= 0.81). Conclusions Assignment of molecular subtype classes of breast cancer based on the analysis of dendrograms obtained with hierarchical cluster analysis is subjective and shows modest interobserver reproducibility. For the development of a molecular taxonomy, objective definitions for each molecular subtype and standardized methods for their identification are required

Crossref

PubMed Central

King's Research Portal

Institute of Cancer Research Repository

Epigenetic expansion of VHL-HIF signal output drives multiorgan metastasis in renal cancer.

Inactivation of the von Hippel-Lindau tumor suppressor gene, VHL, is an archetypical tumor-initiating event in clear cell renal carcinoma (ccRCC) that leads to the activation of hypoxia-inducible transcription factors (HIFs). However, VHL mutation status in ccRCC is not correlated with clinical outcome. Here we show that during ccRCC progression, cancer cells exploit diverse epigenetic alterations to empower a branch of the VHL-HIF pathway for metastasis, and the strength of this activation is associated with poor clinical outcome. By analyzing metastatic subpopulations of VHL-deficient ccRCC cells, we discovered an epigenetically altered VHL-HIF response that is specific to metastatic ccRCC. Focusing on the two most prominent pro-metastatic VHL-HIF target genes, we show that loss of Polycomb repressive complex 2 (PRC2)-dependent histone H3 Lys27 trimethylation (H3K27me3) activates HIF-driven chemokine (C-X-C motif) receptor 4 (CXCR4) expression in support of chemotactic cell invasion, whereas loss of DNA methylation enables HIF-driven cytohesin 1 interacting protein (CYTIP) expression to protect cancer cells from death cytokine signals. Thus, metastasis in ccRCC is based on an epigenetically expanded output of the tumor-initiating pathway

Identification of a SOX2-dependent subset of tumor- and sphere-forming glioblastoma cells with a distinct tyrosine kinase inhibitor sensitivity profile

Author: Al-Hajj
Alcantara Llaguno
Arne Östman
Avilion
Bao
Bao
Bass
Beier
Buchdunger
Calabrese
Calbo
Chakraborty
Chearwae
Chen
Cho
Clement
Collins
Cunningham
Dai
Daniel Hägerstrand
Desbois-Mouthon
Doetsch
Dresemann
Dresemann
Du
Ehtesham
Esparis-Ogando
Fan
Gal
Galli
Gangemi
Garcia-Echeverria
Godlewski
Gunther
Göran Hesselager
Hagerstrand
Hagerstrand
Hermanson
Holmberg
Huang
Ikushima
Inda
Kilic
Li
Maja Bradic Lindh
Malatesta
Mazzoleni
McShane
Monica Nistér
Noushmehr
Nutt
Parsons
Phi
Phillips
Piccirillo
Polyak
Reardon
Reardon
Ricci-Vitiani
Rubin
Sakariassen
Saskia Hoefs
Seidel
Shawver
Singh
Singh
Stommel
Takahashi
Tan
TCGA
Tropepe
Tunici
Uhrbom
Verhaak
Wang
Wang
Warshamana-Greene
Wen
Xiaobing He
Zheng
Zheng
Publication venue: Oxford University Press
Publication date: 21/09/2012
Field of study

Putative cancer stem cells have been identified in glioblastomas and are associated with radio- and chemo-resistance. Further knowledge about these cells is thus highly warranted for the development of better glioblastoma therapies

Crossref

Publications from Karolinska Institutet

PubMed Central

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

Abstract Background Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, <it>post hoc </it>tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has <it>a priori </it>information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. Results We developed a four step, <it>post hoc </it>pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with <it>post hoc </it>'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material. Conclusion The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit <it>a priori </it>expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Kentucky