34 research outputs found

    Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides

    Get PDF
    We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified

    Therapeutic Cancer Vaccination with Immunopeptidomics-Discovered Antigens Confers Protective Antitumor Efficacy

    Get PDF
    Simple Summary Immunotherapy has revolutionized cancer treatment, yet many tumors remain resistant to current immuno-oncology therapies. Here we explore a novel, customized oncolytic adenovirus vaccine platform as immunotherapy in a resistant tumor model. We present a workflow for customizing the oncolytic vaccine for improved tumor targeting. This targeting is based on experimentally discovered tumor antigens, which are incorporated as active components of the vaccine formulation. The pipeline may be further applied for designing personalized therapeutic cancer vaccines. Knowledge of clinically targetable tumor antigens is becoming vital for broader design and utility of therapeutic cancer vaccines. This information is obtained reliably by directly interrogating the MHC-I presented peptide ligands, the immunopeptidome, with state-of-the-art mass spectrometry. Our manuscript describes direct identification of novel tumor antigens for an aggressive triple-negative breast cancer model. Immunopeptidome profiling revealed 2481 unique antigens, among them a novel ERV antigen originating from an endogenous retrovirus element. The clinical benefit and tumor control potential of the identified tumor antigens and ERV antigen were studied in a preclinical model using two vaccine platforms and therapeutic settings. Prominent control of established tumors was achieved using an oncolytic adenovirus platform designed for flexible and specific tumor targeting, namely PeptiCRAd. Our study presents a pipeline integrating immunopeptidome analysis-driven antigen discovery with a therapeutic cancer vaccine platform for improved personalized oncolytic immunotherapy.Peer reviewe

    Glioblastoma stem cells express non-canonical proteins and exclusive mesenchymal-like or non-mesenchymal-like protein signatures

    Get PDF
    Glioblastoma (GBM) cancer stem cells (GSCs) contribute to GBM's origin, recurrence, and resistance to treatment. However, the understanding of how mRNA expression patterns of GBM subtypes are reflected at global proteome level in GSCs is limited. To characterize protein expression in GSCs, we performed in-depth proteogenomic analysis of patient-derived GSCs by RNA-sequencing and mass-spectrometry. We quantified > 10 000 proteins in two independent GSC panels and propose a GSC-associated proteomic signature characterizing two distinct phenotypic conditions; one defined by proteins upregulated in proneural and classical GSCs (GPC-like), and another by proteins upregulated in mesenchymal GSCs (GM-like). The GM-like protein set in GBM tissue was associated with necrosis, recurrence, and worse overall survival. Through proteogenomics, we discovered 252 non-canonical peptides in the GSCs, i.e., protein sequences that are variant or derive from genome regions previously considered non-protein-coding, including variants of the heterogeneous ribonucleoproteins implicated in RNA splicing. In summary, GSCs express two protein sets that have an inverse association with clinical outcomes in GBM. The discovery of non-canonical protein sequences questions existing gene models and pinpoints new protein targets for research in GBM

    Proteogenomic analysis of acute myeloid leukemia associates relapsed disease with reprogrammed energy metabolism both in adults and children

    Get PDF
    Despite improvement of current treatment strategies and novel targeted drugs, relapse and treatment resistance largely determine the outcome for acute myeloid leukemia (AML) patients. To identify the underlying molecular characteristics, numerous studies have been aimed to decipher the genomic- and transcriptomic landscape of AML. Nevertheless, further molecular changes allowing malignant cells to escape treatment remain to be elucidated. Mass spectrometry is a powerful tool enabling detailed insights into proteomic changes that could explain AML relapse and resistance. Here, we investigated AML samples from 47 adult and 22 pediatric patients at serial time-points during disease progression using mass spectrometry-based in-depth proteomics. We show that the proteomic profile at relapse is enriched for mitochondrial ribosomal proteins and subunits of the respiratory chain complex, indicative of reprogrammed energy metabolism from diagnosis to relapse. Further, higher levels of granzymes and lower levels of the anti-inflammatory protein CR1/CD35 suggest an inflammatory signature promoting disease progression. Finally, through a proteogenomic approach, we detected novel peptides, which present a promising repertoire in the search for biomarkers and tumor-specific druggable targets. Altogether, this study highlights the importance of proteomic studies in holistic approaches to improve treatment and survival of AML patients.Peer reviewe

    Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

    Get PDF
    Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Data from: Reliability assessment of null allele detection: inconsistencies between and within different methods

    No full text
    Microsatellite loci are widely used in population genetic studies, but the presence of null alleles may lead to biased results. Here, we assessed five methods that indirectly detect null alleles and found large inconsistencies among them. Our analysis was based on 20 microsatellite loci genotyped in a natural population of Microtus oeconomus sampled during 8 years, together with 1200 simulated populations without null alleles, but experiencing bottlenecks of varying duration and intensity, and 120 simulated populations with known null alleles. In the natural population, 29% of positive results were consistent between the methods in pairwise comparisons, and in the simulated data set, this proportion was 14%. The positive results were also inconsistent between different years in the natural population. In the null-allele-free simulated data set, the number of false positives increased with increased bottleneck intensity and duration. We also found a low concordance in null allele detection between the original simulated populations and their 20% random subsets. In the populations simulated to include null alleles, between 22% and 42% of true null alleles remained undetected, which highlighted that detection errors are not restricted to false positives. None of the evaluated methods clearly outperformed the others when both false-positive and false-negative rates were considered. Accepting only the positive results consistent between at least two methods should considerably reduce the false-positive rate, but this approach may increase the false-negative rate. Our study demonstrates the need for novel null allele detection methods that could be reliably applied to natural populations

    Maps of context-dependent putative regulatory regions and genomic signal interactions

    No full text
    Gene transcription is regulated mainly by transcription factors (TFs). ENCODE and Roadmap Epigenomics provide global binding profiles of TFs, which can be used to identify regulatory regions. To this end we implemented a method to systematically construct cell-type and species-specific maps of regulatory regions and TF-TF interactions. We illustrated the approach by developing maps for five human cell-lines and two other species. We detected similar to 144k putative regulatory regions among the human cell-lines, with the majority of them being similar to 300 bp. We found similar to 20k putative regulatory elements in the ENCODE heterochromatic domains suggesting a large regulatory potential in the regions presumed transcriptionally silent. Among the most significant TF interactions identified in the heterochromatic regions were CTCF and the cohesin complex, which is in agreement with previous reports. Finally, we investigated the enrichment of the obtained putative regulatory regions in the 3D chromatin domains. More than 90% of the regions were discovered in the 3D contacting domains. We found a significant enrichment of GWAS SNPs in the putative regulatory regions. These significant enrichments provide evidence that the regulatory regions play a crucial role in the genomic structural stability. Additionally, we generated maps of putative regulatory regions for prostate and colorectal cancer human cell-lines.De två första författarna delar förstaförfattarskapet.</p
    corecore