79 research outputs found

    Statistical modeling for selecting housekeeper genes

    Get PDF
    There is a need for statistical methods to identify genes that have minimal variation in expression across a variety of experimental conditions. These 'housekeeper' genes are widely employed as controls for quantification of test genes using gel analysis and real-time RT-PCR. Using real-time quantitative RT-PCR, we analyzed 80 primary breast tumors for variation in expression of six putative housekeeper genes (MRPL19 (mitochondrial ribosomal protein L19), PSMC4 (proteasome (prosome, macropain) 26S subunit, ATPase, 4), SF3A1 (splicing factor 3a, subunit 1, 120 kDa), PUM1 (pumilio homolog 1 (Drosophila)), ACTB (actin, beta) and GAPD (glyceraldehyde-3-phosphate dehydrogenase)). We present appropriate models for selecting the best housekeepers to normalize quantitative data within a given tissue type (for example, breast cancer) and across different types of tissue samples

    Correction: Statistical modeling for selecting housekeeper genes

    Get PDF
    A correction to Statistical modeling for selecting housekeeper genes by Aniko Szabo, Charles M Perou, Mehmet Karaca, Laurent Perreard, John F Quackenbush, and Philip S Bernard. Genome Biology 2004, 5:R5

    attract: A Method for Identifying Core Pathways That Define Cellular Phenotypes

    Get PDF
    attract is a knowledge-driven analytical approach for identifying and annotating the gene-sets that best discriminate between cell phenotypes. attract finds distinguishing patterns within pathways, decomposes pathways into meta-genes representative of these patterns, and then generates synexpression groups of highly correlated genes from the entire transcriptome dataset. attract can be applied to a wide range of biological systems and is freely available as a Bioconductor package and has been incorporated into the MeV software system

    Classification and risk stratification of invasive breast carcinomas using a real-time quantitative RT-PCR assay

    Get PDF
    INTRODUCTION: Predicting the clinical course of breast cancer is often difficult because it is a diverse disease comprised of many biological subtypes. Gene expression profiling by microarray analysis has identified breast cancer signatures that are important for prognosis and treatment. In the current article, we use microarray analysis and a real-time quantitative reverse-transcription (qRT)-PCR assay to risk-stratify breast cancers based on biological 'intrinsic' subtypes and proliferation. METHODS: Gene sets were selected from microarray data to assess proliferation and to classify breast cancers into four different molecular subtypes, designated Luminal, Normal-like, HER2+/ER-, and Basal-like. One-hundred and twenty-three breast samples (117 invasive carcinomas, one fibroadenoma and five normal tissues) and three breast cancer cell lines were prospectively analyzed using a microarray (Agilent) and a qRT-PCR assay comprised of 53 genes. Biological subtypes were assigned from the microarray and qRT-PCR data by hierarchical clustering. A proliferation signature was used as a single meta-gene (log(2 )average of 14 genes) to predict outcome within the context of estrogen receptor status and biological 'intrinsic' subtype. RESULTS: We found that the qRT-PCR assay could determine the intrinsic subtype (93% concordance with microarray-based assignments) and that the intrinsic subtypes were predictive of outcome. The proliferation meta-gene provided additional prognostic information for patients with the Luminal subtype (P = 0.0012), and for patients with estrogen receptor-positive tumors (P = 3.4 × 10(-6)). High proliferation in the Luminal subtype conferred a 19-fold relative risk of relapse (confidence interval = 95%) compared with Luminal tumors with low proliferation. CONCLUSION: A real-time qRT-PCR assay can recapitulate microarray classifications of breast cancer and can risk-stratify patients using the intrinsic subtype and proliferation. The proliferation meta-gene offers an objective and quantitative measurement for grade and adds significant prognostic information to the biological subtypes

    The molecular portraits of breast tumors are conserved acress microarray platforms

    Get PDF
    Background Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list. Results A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups. Conclusion This study validates the breast tumor intrinsic subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile

    The molecular portraits of breast tumors are conserved across microarray platforms

    Get PDF
    BACKGROUND: Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list. RESULTS: A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups. CONCLUSION: This study validates the "breast tumor intrinsic" subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile

    A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB

    Get PDF
    BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. RESULTS: We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. CONCLUSION: MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML

    Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

    Get PDF
    The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species
    corecore