11 research outputs found

    Statistical modeling for selecting housekeeper genes

    Get PDF
    There is a need for statistical methods to identify genes that have minimal variation in expression across a variety of experimental conditions. These 'housekeeper' genes are widely employed as controls for quantification of test genes using gel analysis and real-time RT-PCR. Using real-time quantitative RT-PCR, we analyzed 80 primary breast tumors for variation in expression of six putative housekeeper genes (MRPL19 (mitochondrial ribosomal protein L19), PSMC4 (proteasome (prosome, macropain) 26S subunit, ATPase, 4), SF3A1 (splicing factor 3a, subunit 1, 120 kDa), PUM1 (pumilio homolog 1 (Drosophila)), ACTB (actin, beta) and GAPD (glyceraldehyde-3-phosphate dehydrogenase)). We present appropriate models for selecting the best housekeepers to normalize quantitative data within a given tissue type (for example, breast cancer) and across different types of tissue samples

    Correction: Statistical modeling for selecting housekeeper genes

    Get PDF
    A correction to Statistical modeling for selecting housekeeper genes by Aniko Szabo, Charles M Perou, Mehmet Karaca, Laurent Perreard, John F Quackenbush, and Philip S Bernard. Genome Biology 2004, 5:R5

    DNA 5-hydroxymethylcytosine in pediatric central nervous system tumors may impact tumor classification and is a positive prognostic marker

    Get PDF
    Background: Nucleotide-specific 5-hydroxymethylcytosine (5hmC) remains understudied in pediatric central nervous system (CNS) tumors. 5hmC is abundant in the brain, and alterations to 5hmC in adult CNS tumors have been reported. However, traditional approaches to measure DNA methylation do not distinguish between 5-methylcytosine (5mC) and its oxidized counterpart 5hmC, including those used to build CNS tumor DNA methylation classification systems. We measured 5hmC and 5mC epigenome-wide at nucleotide resolution in glioma, ependymoma, and embryonal tumors from children, as well as control pediatric brain tissues using tandem bisulfite and oxidative bisulfite treatments followed by hybridization to the Illumina Methylation EPIC Array that interrogates over 860,000 CpG loci. Results: Linear mixed effects models adjusted for age and sex tested the CpG-specific differences in 5hmC between tumor and non-tumor samples, as well as between tumor subtypes. Results from model-based clustering of tumors was used to test the relation of cluster membership with patient survival through multivariable Cox proportional hazards regression. We also assessed the robustness of multiple epigenetic CNS tumor classification methods to 5mC-specific data in both pediatric and adult CNS tumors. Compared to non-tumor samples, tumors were hypohydroxymethylated across the epigenome and tumor 5hmC localized to regulatory elements crucial to cell identity, including transcription factor binding sites and super-enhancers. Differentially hydroxymethylated loci among tumor subtypes tended to be hypermethylated and disproportionally found in CTCF binding sites and genes related to posttranscriptional RNA regulation, such as DICER1. Model-based clustering results indicated that patients with low 5hmC patterns have poorer overall survival and increased risk of recurrence. Our results suggest 5mC-specific data from OxBS-treated samples impacts methylation-based tumor classification systems giving new opportunities for further refinement of classifiers for both pediatric and adult tumors. Conclusions: We identified that 5hmC localizes to super-enhancers, and genes commonly implicated in pediatric CNS tumors were differentially hypohydroxymethylated. We demonstrated that distinguishing methylation and hydroxymethylation is critical in identifying tumor-related epigenetic changes. These results have implications for patient prognostication, considerations of epigenetic therapy in CNS tumors, and for emerging molecular neuropathology classification approaches

    Classification and risk stratification of invasive breast carcinomas using a real-time quantitative RT-PCR assay

    Get PDF
    INTRODUCTION: Predicting the clinical course of breast cancer is often difficult because it is a diverse disease comprised of many biological subtypes. Gene expression profiling by microarray analysis has identified breast cancer signatures that are important for prognosis and treatment. In the current article, we use microarray analysis and a real-time quantitative reverse-transcription (qRT)-PCR assay to risk-stratify breast cancers based on biological 'intrinsic' subtypes and proliferation. METHODS: Gene sets were selected from microarray data to assess proliferation and to classify breast cancers into four different molecular subtypes, designated Luminal, Normal-like, HER2+/ER-, and Basal-like. One-hundred and twenty-three breast samples (117 invasive carcinomas, one fibroadenoma and five normal tissues) and three breast cancer cell lines were prospectively analyzed using a microarray (Agilent) and a qRT-PCR assay comprised of 53 genes. Biological subtypes were assigned from the microarray and qRT-PCR data by hierarchical clustering. A proliferation signature was used as a single meta-gene (log(2 )average of 14 genes) to predict outcome within the context of estrogen receptor status and biological 'intrinsic' subtype. RESULTS: We found that the qRT-PCR assay could determine the intrinsic subtype (93% concordance with microarray-based assignments) and that the intrinsic subtypes were predictive of outcome. The proliferation meta-gene provided additional prognostic information for patients with the Luminal subtype (P = 0.0012), and for patients with estrogen receptor-positive tumors (P = 3.4 × 10(-6)). High proliferation in the Luminal subtype conferred a 19-fold relative risk of relapse (confidence interval = 95%) compared with Luminal tumors with low proliferation. CONCLUSION: A real-time qRT-PCR assay can recapitulate microarray classifications of breast cancer and can risk-stratify patients using the intrinsic subtype and proliferation. The proliferation meta-gene offers an objective and quantitative measurement for grade and adds significant prognostic information to the biological subtypes

    The molecular portraits of breast tumors are conserved acress microarray platforms

    Get PDF
    Background Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list. Results A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups. Conclusion This study validates the breast tumor intrinsic subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile

    The molecular portraits of breast tumors are conserved across microarray platforms

    Get PDF
    BACKGROUND: Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list. RESULTS: A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups. CONCLUSION: This study validates the "breast tumor intrinsic" subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile

    Inferring spatial transcriptomics markers from whole slide images to characterize metastasis-related spatial heterogeneity of colorectal tumors: A pilot study

    No full text
    Over 150 000 Americans are diagnosed with colorectal cancer (CRC) every year, and annually over 50 000 individuals will die from CRC, necessitating improvements in screening, prognostication, disease management, and therapeutic options. Tumor metastasis is the primary factor related to the risk of recurrence and mortality. Yet, screening for nodal and distant metastasis is costly, and invasive and incomplete resection may hamper adequate assessment. Signatures of the tumor-immune microenvironment (TIME) at the primary site can provide valuable insights into the aggressiveness of the tumor and the effectiveness of various treatment options. Spatially resolved transcriptomics technologies offer an unprecedented characterization of TIME through high multiplexing, yet their scope is constrained by cost. Meanwhile, it has long been suspected that histological, cytological, and macroarchitectural tissue characteristics correlate well with molecular information (e.g., gene expression). Thus, a method for predicting transcriptomics data through inference of RNA patterns from whole slide images (WSI) is a key step in studying metastasis at scale. In this work, we collected tissue from 4 stage-III (pT3) matched colorectal cancer patients for spatial transcriptomics profiling. The Visium spatial transcriptomics (ST) assay was used to measure transcript abundance for 17 943 genes at up to 5000 55-micron (i.e., 1–10 cells) spots per patient sampled in a honeycomb pattern, co-registered with hematoxylin and eosin (H&E) stained WSI. The Visium ST assay can measure expression at these spots through tissue permeabilization of mRNAs, which are captured through spatially (i.e., x–y positional coordinates) barcoded, gene specific oligo probes. WSI subimages were extracted around each co-registered Visium spot and were used to predict the expression at these spots using machine learning models. We prototyped and compared several convolutional, transformer, and graph convolutional neural networks to predict spatial RNA patterns at the Visium spots under the hypothesis that the transformer- and graph-based approaches better capture relevant spatial tissue architecture. We further analyzed the model’s ability to recapitulate spatial autocorrelation statistics using SPARK and SpatialDE. Overall, the results indicate that the transformer- and graph-based approaches were unable to outperform the convolutional neural network architecture, though they exhibited optimal performance for relevant disease-associated genes. Initial findings suggest that different neural networks that operate on different scales are relevant for capturing distinct disease pathways (e.g., epithelial to mesenchymal transition). We add further evidence that deep learning models can accurately predict gene expression in whole slide images and comment on understudied factors which may increase its external applicability (e.g., tissue context). Our preliminary work will motivate further investigation of inference for molecular patterns from whole slide images as metastasis predictors and in other applications
    corecore