2,050 research outputs found

    Genetic determinants of the molecular portraits of epithelial cancers

    Get PDF
    The ability to characterize and predict tumor phenotypes is crucial to precision medicine. In this study, we present an integrative computational approach using a genome-wide association analysis and an Elastic Net prediction method to analyze the relationship between DNA copy number alterations and an archive of gene expression signatures. Across breast cancers, we are able to quantitatively predict many gene signatures levels within individual tumors with high accuracy based upon DNA copy number features alone, including proliferation status and Estrogen-signaling pathway activity. We can also predict many other key phenotypes, including intrinsic molecular subtypes, estrogen receptor status, and TP53 mutation. This approach is also applied to TCGA Pan-Cancer, which identify repeatedly predictable signatures across tumor types including immune features in lung squamous and basal-like breast cancers. These Elastic Net DNA predictors could also be called from DNA-based gene panels, thus facilitating their use as biomarkers to guide therapeutic decision making

    Joint and individual analysis of breast cancer histologic images and genomic covariates

    Get PDF
    A key challenge in modern data analysis is understanding connections between complex and differing modalities of data. For example, two of the main approaches to the study of breast cancer are histopathology (analyzing visual characteristics of tumors) and genetics. While histopathology is the gold standard for diagnostics and there have been many recent breakthroughs in genetics, there is little overlap between these two fields. We aim to bridge this gap by developing methods based on Angle-based Joint and Individual Variation Explained (AJIVE) to directly explore similarities and differences between these two modalities. Our approach exploits Convolutional Neural Networks (CNNs) as a powerful, automatic method for image feature extraction to address some of the challenges presented by statistical analysis of histopathology image data. CNNs raise issues of interpretability that we address by developing novel methods to explore visual modes of variation captured by statistical algorithms (e.g. PCA or AJIVE) applied to CNN features. Our results provide many interpretable connections and contrasts between histopathology and genetics

    Cost analysis for cancer subgroups

    Get PDF

    PAM50 assay and the three-gene model for identifying the major and clinically relevant molecular subtypes of breast cancer

    Get PDF
    It has recently been proposed that a three-gene model (SCMGENE) that measures ESR1, ERBB2, and AURKA identifies the major breast cancer intrinsic subtypes and provides robust discrimination for clinical use in a manner very similar to a 50-gene subtype predictor (PAM50). However, the clinical relevance of both predictors was not fully explored, which is needed given that a~30% discordance rate between these two predictors was observed. Using the same datasets and subtype calls provided by Haibe-Kains and colleagues, we compared the SCMGENE assignments and the research-based PAM50 assignments in terms of their ability to (1) predict patient outcome, (2) predict pathological complete response (pCR) after anthracycline/taxane-based chemotherapy, and (3) capture the main biological diversity displayed by all genes from a microarray. In terms of survival predictions, both assays provided independent prognostic information from each other and beyond the data provided by standard clinical–pathological variables; however, the amount of prognostic information was found to be significantly greater with the PAM50 assay than the SCMGENE assay. In terms of chemotherapy response, the PAM50 assay was the only assay to provide independent predictive information of pCR in multivariate models. Finally, compared to the SCMGENE predictor, the PAM50 assay explained a significantly greater amount of gene expression diversity as captured by the two main principal components of the breast cancer microarray data. Our results show that classification of the major and clinically relevant molecular subtypes of breast cancer are best captured using larger gene panels.Electronic supplementary materialThe online version of this article (doi:10.1007/s10549-012-2143-0) contains supplementary material, which is available to authorized users

    Virus expression detection reveals RNA-sequencing contamination in TCGA

    Get PDF
    Background: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. Results: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the "common reference", which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the "common reference". One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. Conclusions: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV

    The tissue microarray data exchange specification: A community-based, open source tool for sharing tissue microarray data

    Get PDF
    BACKGROUND: Tissue Microarrays (TMAs) allow researchers to examine hundreds of small tissue samples on a single glass slide. The information held in a single TMA slide may easily involve Gigabytes of data. To benefit from TMA technology, the scientific community needs an open source TMA data exchange specification that will convey all of the data in a TMA experiment in a format that is understandable to both humans and computers. A data exchange specification for TMAs allows researchers to submit their data to journals and to public data repositories and to share or merge data from different laboratories. In May 2001, the Association of Pathology Informatics (API) hosted the first in a series of four workshops, co-sponsored by the National Cancer Institute, to develop an open, community-supported TMA data exchange specification. METHODS: A draft tissue microarray data exchange specification was developed through workshop meetings. The first workshop confirmed community support for the effort and urged the creation of an open XML-based specification. This was to evolve in steps with approval for each step coming from the stakeholders in the user community during open workshops. By the fourth workshop, held October, 2002, a set of Common Data Elements (CDEs) was established as well as a basic strategy for organizing TMA data in self-describing XML documents. RESULTS: The TMA data exchange specification is a well-formed XML document with four required sections: 1) Header, containing the specification Dublin Core identifiers, 2) Block, describing the paraffin-embedded array of tissues, 3)Slide, describing the glass slides produced from the Block, and 4) Core, containing all data related to the individual tissue samples contained in the array. Eighty CDEs, conforming to the ISO-11179 specification for data elements constitute XML tags used in the TMA data exchange specification. A set of six simple semantic rules describe the complete data exchange specification. Anyone using the data exchange specification can validate their TMA files using a software implementation written in Perl and distributed as a supplemental file with this publication. CONCLUSION: The TMA data exchange specification is now available in a draft form with community-approved Common Data Elements and a community-approved general file format and data structure. The specification can be freely used by the scientific community. Efforts sponsored by the Association for Pathology Informatics to refine the draft TMA data exchange specification are expected to continue for at least two more years. The interested public is invited to participate in these open efforts. Information on future workshops will be posted at (API we site)

    Identification of a stable molecular signature in mammary tumor endothelial cells that persists in vitro

    Get PDF
    Long-term, in vitro propagation of tumor-specific endothelial cells (TEC) allows for functional studies and genome-wide expression profiling of clonally-derived, well-characterized subpopulations. Using a genetically engineered mouse model (GEMM) of mammary adenocarcinoma, we have optimized an isolation procedure and defined growth conditions for long-term propagation of mammary TEC. The isolated TEC maintain their endothelial specification and phenotype in culture. Furthermore, gene expression profiling of multiple TEC subpopulations revealed striking, persistent overexpression of several candidate genes including Irx2 and Zfp503 (transcription factors), Alcam and Cd133 (cell surface markers), Ccl4 and neurotensin (Nts) (angiocrine factors), and Gpr182 and Cnr2 (G protein-coupled receptors, GPCRs). Taken together, we have developed an effective method for isolating and culture-expanding mammary TEC, and uncovered several new TEC-selective genes whose overexpression persists even after long-term in vitro culture. These results suggest that the tumor microenvironment may induce changes in vascular endothelium in vivo that are stably transmittable in vitro

    Radiation-Induced Gene Signature Predicts Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Patients

    Get PDF
    The identification of biomarkers predictive of neoadjuvant chemotherapy response in breast cancer patients would be an important advancement in personalized cancer therapy. In this study, we hypothesized that due to similarities between radiation- and chemotherapy-induced cellular response mechanisms, radiation-responsive genes may be useful in predicting response to neoadjuvant chemotherapy. Murine p53 null breast cancer cell lines representative of the luminal, basal-like and claudin-low human breast cancer subtypes were irradiated to identify radiation-responsive genes across subtypes. These murine tumor radiation-induced genes were then converted to their human orthologs, and subsequently tested as a predictor of pathologic complete response (pCR), which was validated on two independent published neoadjuvant chemotherapy datasets of genomic data with chemotherapy response. A radiation-induced gene signature consisting of 30 genes was identified on a training set of 337 human primary breast cancer tumor samples that was prognostic for survival. Mean expression of this signature was calculated for individual samples on two independent published datasets and was found to be significantly predictive of pCR. Multivariate logistic regression analysis in both independent datasets showed that this 30 gene signature added significant predictive information independent of that provided by standard clinical predictors and other gene expression-based predictors of pCR. This study provides new information for radiation-induced biology, as well as information regarding response to neoadjuvant chemotherapy and a possible means of improving the prediction of pCR
    corecore