229 research outputs found

    PIE - the protein inference engine

    Get PDF
    Posttranslational modifications are vital to protein function but are hard to study, especially since several modification isoforms may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they generate are often incomplete, ambiguous, and difficult to interpret. Combining data from multiple experimental techniques provides complementary information. Having both top-down (intact protein mass data) and bottom-up (peptide data) is especially valuable. In the context of background knowledge, combined data is used by human experts to interpret what modifications are present and where they are located. However, this process is arduous and for high-throughput applications needs to be automated. To explore a data integration methodology based on Markov chain Monte Carlo and simulated annealing, I developed the PIE (Protein Inference Engine). This java application integrates information using a modular approach which allows different types of data to be considered simultaneously and for new data types to be added as needed. Validation of the PIE was carried out using two realistically imperfect theoretical data sets. The first, based on the L7/L12 ribosomal protein, tested the limits of PIEs performance as intact mass accuracy and peptide coverage decreases. The second set, based on a mix of two modification variants of the H23c Histone protein, tested PIEs ability to handle isoform mixtures and up to eight simultaneous modifications. The PIE was then applied to analysis of experimental data from an investigation of the modification state of the L7/L12 ribosomal protein. This data consisted of a set of peptides identified as associated with some L7/L12 modification variant and nine intact masses measurements identified as an L7/ L12 modification variant. From this data, PIE was able to make consistent predictions, comparable to expert manual interpretation. Software, source code, user manuals, and demo projects replicating the analyses described in the following can be downloaded from http://pie.giddingslab.org/

    Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data

    Get PDF
    Motivation: Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques—especially bottom-up and top-down mass spectrometry—provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas

    Get PDF
    This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin

    Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

    Get PDF
    Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts

    Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas

    Get PDF
    Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN

    Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

    Get PDF
    Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment

    Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.

    Get PDF
    Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy

    Integrative Genomic Analysis of Cholangiocarcinoma Identifies Distinct IDH -Mutant Molecular Profiles

    Get PDF
    Cholangiocarcinoma (CCA) is an aggressive malignancy of the bile ducts, with poor prognosis and limited treatment options. Here, we describe the integrated analysis of somatic mutations, RNA expression, copy number, and DNA methylation by The Cancer Genome Atlas of a set of predominantly intrahepatic CCA cases and propose a molecular classification scheme. We identified an IDH mutant-enriched subtype with distinct molecular features including low expression of chromatin modifiers, elevated expression of mitochondrial genes, and increased mitochondrial DNA copy number. Leveraging the multi-platform data, we observed that ARID1A exhibited DNA hypermethylation and decreased expression in the IDH mutant subtype. More broadly, we found that IDH mutations are associated with an expanded histological spectrum of liver tumors with molecular features that stratify with CCA. Our studies reveal insights into the molecular pathogenesis and heterogeneity of cholangiocarcinoma and provide classification information of potential therapeutic significance

    Analyses of non-coding somatic drivers in 2,658 cancer whole genomes

    Get PDF
    The discovery of drivers of cancer has traditionally focused on protein-coding genes1–4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5′ region of TP53, in the 3′ untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available
    corecore