107 research outputs found
An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types
Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas
This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing
molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin
Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context
Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas
Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and
correlation with overall survival. TIL map structural patterns were grouped using standard
histopathological parameters. These patterns are enriched in particular T cell subpopulations
derived from molecular measures. TIL densities and spatial structure were differentially enriched
among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial
infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic
patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for
the TCGA image archives with insights into the tumor-immune microenvironment
Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma
BACKGROUND Papillary renal-cell carcinoma, which accounts for 15 to 20% of renal-cell carcinomas, is a heterogeneous disease that consists of various types of renal cancer, including tumors with indolent, multifocal presentation and solitary tumors with an aggressive, highly lethal phenotype. Little is known about the genetic basis of sporadic papillary renal-cell carcinoma, and no effective forms of therapy for advanced disease exist. METHODS We performed comprehensive molecular characterization of 161 primary papillary renal-cell carcinomas, using whole-exome sequencing, copy-number analysis, messenger RNA and microRNA sequencing, DNA-methylation analysis, and proteomic analysis. RESULTS Type 1 and type 2 papillary renal-cell carcinomas were shown to be different types of renal cancer characterized by specific genetic alterations, with type 2 further classified into three individual subgroups on the basis of molecular differences associated with patient survival. Type 1 tumors were associated with MET alterations, whereas type 2 tumors were characterized by CDKN2A silencing, SETD2 mutations, TFE3 fusions, and increased expression of the NRF2'antioxidant response element (ARE) pathway. A CpG island methylator phenotype (CIMP) was observed in a distinct subgroup of type 2 papillary renal-cell carcinomas that was characterized by poor survival and mutation of the gene encoding fumarate hydratase (FH). CONCLUSIONS Type 1 and type 2 papillary renal-cell carcinomas were shown to be clinically and biologically distinct. Alterations in the MET pathway were associated with type 1, and activation of the NRF2-ARE pathway was associated with type 2; CDKN2A loss and CIMP in type 2 conveyed a poor prognosis. Furthermore, type 2 papillary renalcell carcinoma consisted of at least three subtypes based on molecular and phenotypic features
'Omic approaches to preventing or managing metastatic breast cancer
Early detection of metastasis-prone breast cancers and characterization of residual metastatic cancers are important in efforts to improve management of breast cancer. Applications of genome-scale molecular analysis technologies are making these complementary approaches possible by revealing molecular features uniquely associated with metastatic disease. Assays that reveal these molecular features will facilitate development of anatomic, histological and blood-based strategies that may enable detection prior to metastatic spread. Knowledge of these features also will guide development of therapeutic strategies that can be applied when metastatic disease burden is low, thereby increasing the probability of a curative response
Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.
Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy
Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe
We describe the Sloan Digital Sky Survey IV (SDSS-IV), a project encompassing three major spectroscopic programs. The Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) is observing hundreds of thousands of Milky Way stars at high resolution and high signal-to-noise ratios in the near-infrared. The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey is obtaining spatially resolved spectroscopy for thousands of nearby galaxies (median ). The extended Baryon Oscillation Spectroscopic Survey (eBOSS) is mapping the galaxy, quasar, and neutral gas distributions between and 3.5 to constrain cosmology using baryon acoustic oscillations, redshift space distortions, and the shape of the power spectrum. Within eBOSS, we are conducting two major subprograms: the SPectroscopic IDentification of eROSITA Sources (SPIDERS), investigating X-ray AGNs and galaxies in X-ray clusters, and the Time Domain Spectroscopic Survey (TDSS), obtaining spectra of variable sources. All programs use the 2.5 m Sloan Foundation Telescope at the Apache Point Observatory; observations there began in Summer 2014. APOGEE-2 also operates a second near-infrared spectrograph at the 2.5 m du Pont Telescope at Las Campanas Observatory, with observations beginning in early 2017. Observations at both facilities are scheduled to continue through 2020. In keeping with previous SDSS policy, SDSS-IV provides regularly scheduled public data releases; the first one, Data Release 13, was made available in 2016 July
The Fifteenth Data Release of the Sloan Digital Sky Surveys: First Release of MaNGA-derived Quantities, Data Visualization Tools, and Stellar Library
Twenty years have passed since first light for the Sloan Digital Sky Survey (SDSS). Here, we release data taken by the fourth phase of SDSS (SDSS-IV) across its first three years of operation (2014 July–2017 July). This is the third data release for SDSS-IV, and the 15th from SDSS (Data Release Fifteen; DR15). New data come from MaNGA—we release 4824 data cubes, as well as the first stellar spectra in the MaNGA Stellar Library (MaStar), the first set of survey-supported analysis products (e.g., stellar and gas kinematics, emission-line and other maps) from the MaNGA Data Analysis Pipeline, and a new data visualization and access tool we call "Marvin." The next data release, DR16, will include new data from both APOGEE-2 and eBOSS; those surveys release no new data here, but we document updates and corrections to their data processing pipelines. The release is cumulative; it also includes the most recent reductions and calibrations of all data taken by SDSS since first light. In this paper, we describe the location and format of the data and tools and cite technical references describing how it was obtained and processed. The SDSS website (www.sdss.org) has also been updated, providing links to data downloads, tutorials, and examples of data use. Although SDSS-IV will continue to collect astronomical data until 2020, and will be followed by SDSS-V (2020–2025), we end this paper by describing plans to ensure the sustainability of the SDSS data archive for many years beyond the collection of data
- …