369 research outputs found

    Sequence analysis methods for the design of cancer vaccines that target tumor-specific mutant antigens (neoantigens)

    Get PDF
    The human adaptive immune system is programmed to distinguish between self and non-self proteins and if trained to recognize markers unique to a cancer, it may be possible to stimulate the selective destruction of cancer cells. Therapeutic cancer vaccines aim to boost the immune system by selectively increasing the population of T cells specifically targeted to the tumor-unique antigens, thereby initiating cancer cell death.. In the past, this approach has primarily focused on targeted selection of ‘shared’ tumor antigens, found across many patients. The advent of massively parallel sequencing and specialized analytical approaches has enabled more efficient characterization of tumor-specific mutant antigens, or neoantigens. Specifically, methods to predict which tumor-specific mutant peptides (neoantigens) can elicit anti-tumor T cell recognition improve predictions of immune checkpoint therapy response and identify one or more neoantigens as targets for personalized vaccines. Selecting the best/most immunogenic neoantigens from a large number of mutations is an important challenge, in particular in cancers with a high mutational load, such as melanomas and smoker-associated lung cancers. To address such a challenging task, Chapter 1 of this thesis describes a genome-guided in silico approach to identifying tumor neoantigens that integrates tumor mutation and expression data (DNA- and RNA-Seq). The cancer vaccine design process, from read alignment to variant calling and neoantigen prediction, typically assumes that the genotype of the Human Reference Genome sequence surrounding each somatic variant is representative of the patient’s genome sequence, and does not account for the effect of nearby variants (somatic or germline) in the neoantigenic peptide sequence. Because the accuracy of neoantigen identification has important implications for many clinical trials and studies of basic cancer immunology, Chapter 2 describes and supports the need for patient-specific inclusion of proximal variants to address this previously oversimplified assumption in the identification of neoantigens. The method of neoantigen identification described in Chapter 1 was subsequently extended (Chapter 3) and improved by the addition of a modular workflow that aids in each component of the neoantigen prediction process from neoantigen identification, prioritization, data visualization, and DNA vaccine design. These chapters describe massively parallel sequence analysis methods that will help in the identification and subsequent refinement of patient-specific antigens for use in personalized immunotherapy

    The Era of Next-Generation Sequencing in Clinical Oncology

    Get PDF

    The Era of Next-Generation Sequencing in Clinical Oncology

    Get PDF

    Distance-based methods for the analysis of Next-Generation sequencing data

    Get PDF
    Die Analyse von NGS Daten ist ein zentraler Aspekt der modernen genomischen Forschung. Bei der Extraktion von Daten aus den beiden am hĂ€ufigsten verwendeten Quellorganismen bestehen jedoch vielfĂ€ltige Problemstellungen. Im ersten Kapitel wird ein neuartiger Ansatz vorgestellt welcher einen Abstand zwischen Krebszellinienkulturen auf Grundlage ihrer kleinen genomischen Varianten bestimmt um die Kulturen zu identifizieren. Eine Voll-Exom sequenzierte Kultur wird durch paarweise Vergleiche zu ReferenzdatensĂ€tzen identifiziert so ein gemessener Abstand geringer ist als dies bei nicht verwandten Kulturen zu erwarten wĂ€re. Die Wirksamkeit der Methode wurde verifiziert, jedoch verbleiben EinschrĂ€nkung da nur das Sequenzierformat des Voll-Exoms unterstĂŒtzt wird. Daher wird im zweiten Kapitel eine publizierte Modifikation des Ansatzes vorgestellt welcher die UnterstĂŒtzung der weitlĂ€ufig genutzten Bulk RNA sowie der Panel-Sequenzierung ermöglicht. Die Ausweitung der Technologiebasis fĂŒhrt jedoch zu einer VerstĂ€rkung von Störeffekten welche zu Verletzungen der mathematischen Konditionen einer Abstandsmetrik fĂŒhren. Daher werden die entstandenen Verletzungen durch statistische Verfahren zuerst quantifiziert und danach durch dynamische Schwellwertanpassungen erfolgreich kompensiert. Das dritte Kapitel stellt eine neuartige Daten-Aufwertungsmethode (Data-Augmentation) vor welche das Trainieren von maschinellen Lernmodellen in Abwesenheit von neoplastischen Trainingsdaten ermöglicht. Ein abstraktes Abstandsmaß wird zwischen neoplastischen EntitĂ€ten sowie EntitĂ€ten gesundem Ursprungs mittels einer transkriptomischen Dekonvolution hergestellt. Die Ausgabe der Dekonvolution erlaubt dann das effektive Vorhersagen von klinischen Eigenschaften von seltenen jedoch biologisch vielfĂ€ltigen Krebsarten wobei die prĂ€diktive Kraft des Verfahrens der des etablierten Goldstandard ebenbĂŒrtig ist.The analysis of NGS data is a central aspect of modern Molecular Genetics and Oncology. The first scientific contribution is the development of a method which identifies Whole-exome-sequenced CCL via the quantification of a distance between their sets of small genomic variants. A distinguishing aspect of the method is that it was designed for the computer-based identification of NGS-sequenced CCL. An identification of an unknown CCL occurs when its abstract distance to a known CCL is smaller than is expected due to chance. The method performed favorably during benchmarks but only supported the Whole-exome-sequencing technology. The second contribution therefore extended the identification method by additionally supporting the Bulk mRNA-sequencing technology and Panel-sequencing format. However, the technological extension incurred predictive biases which detrimentally affected the quantification of abstract distances. Hence, statistical methods were introduced to quantify and compensate for confounding factors. The method revealed a heterogeneity-robust benchmark performance at the trade-off of a slightly reduced sensitivity compared to the Whole-exome-sequencing method. The third contribution is a method which trains Machine-Learning models for rare and diverse cancer types. Machine-Learning models are subsequently trained on these distances to predict clinically relevant characteristics. The performance of such-trained models was comparable to that of models trained on both the substituted neoplastic data and the gold-standard biomarker Ki-67. No proliferation rate-indicative features were utilized to predict clinical characteristics which is why the method can complement the proliferation rate-oriented pathological assessment of biopsies. The thesis revealed that the quantification of an abstract distance can address sources of erroneous NGS data analysis

    Validation of biomarkers predictive of tumor location in coloadenocarcinoma, an analysis of the TCGA COAD dataset

    Get PDF
    Tumor localization correlates with prognosis in coloadenocarcinoma, with aboral tumors having a better overall survival. This can be attributed to their better response to biologicals such as the anti-EGFR (epidermal growth factor receptor) cetuximab. Since the localization of a tumor is trivially determined in a clinical setting, it remains a valuable surrogate parameter for predicting patient outcomes, though it is not a mechanistic explanation. Some possible explanations have been offered: it could be that aboral colonic epithelial cells respond differently to mutagenic stimuli, or that the variation in gut flora from adoral to aboral plays a role in tumor development or behavior. So far, there has been no consensus. By eliminating tumor localization as a confounder, since some aboral tumors behave and develop more like adoral tumors and vice versa, better treatment decisions would be possible. While being slightly more complicated than simply defining the tumor location, testing for a handful of mutations in a tumor specimen is a routine procedure and the increased predictive power of such a model would be of great value for making difficult treatment decisions. It would also represent a starting point for better understanding possible underlying molecular mechanisms. It was hypothesized that –regardless of the causal relationships– this “sidedness” of coloadenocarcinomas could be reconstructed on a genomic and transcriptomic level. In order to test this hypothesis, data from the TCGA (Tumor Cancer Genome Atlas) database was used in a case-control study design to create expression profiles by training two distinct machine-learning algorithms to predict tumor location. The algorithms identified PRAC1, HOXB13, HOXC9, HOXC6, HOTAIR, PRAC2, and HOXC8 (all members of the homeobox gene family) as well as BST2, PLTP, FN1, ITLN1, and AREG as predictors of localization. These finding corroborate previous research using various other methods and fit well into the established framework of previously published literature which solidifies the veracity of the machine-learning models as implemented. As an additional benefit, the work-flow for creating the genomic and transcrip- tomic profiles is very flexible and can be used for further analysis of the TCGA dataset.Tumorlokalisation korreliert mit der Prognose Koloadenokarzinome, wobei ab- orale Tumore ein besseres GesamtĂŒberleben zeigen. Dies kann man auf deren bessere Antwort auf Biologika, wie das anti-EGFR (epidermal growth factor receptor) Medikament Cetuximab, zurĂŒckfĂŒhren. Da die Lokalisation eines Tumors im klinischen Alltag vergleichsweise unkom- pliziert festgestellt werden kann ist sie nach wie vor ein wichtiger Surrogatpa- rameter fĂŒr die Vorhersage von Therapieerfolg, obwohl sie keine mechanistische ErklĂ€rung ist. Einige mögliche ErklĂ€rungen wurden schon vorgeschlagen: es könnte sein, dass die aborale Kolonepithelzellen anders auf mutagene Stimuli reagieren, oder dass die Variation der Flora im Kolon von adoral nach aboral eine Rolle in der Tumorentwicklung und des Verhaltens spielt. Leider gibt es noch keinen Konsens. Die Hypothese wurde aufgestellt, dass –unabhĂ€ngig von den kausalen ZusammenhĂ€ngen– diese “Seitigkeit” der Koloadenokarzinomen auf genomischer und transkriptomischer Ebene rekonstruiert werden kann. Indem man die Tumorlokalisation als Confounder eliminiert könnten Therapieentschei- dungen besser getroffen werden, da sich manche aborale Tumoren wie adorale Tumore entwickeln und verhalten, und umgekehrt. Obwohl etwas komplizierter als nur Tumorlokalisation zu bestimmen, ist die Suche nach einer Handvoll Muta- tionen in einer Tumorprobe ein klinisches Routineverfahren und die verbesserte Vorhersagekraft eines solchen Modells wĂ€re wertvoll fĂŒr schwierige Behand- lungsentscheidungen. Es wĂ€re auch ein Startpunkt fĂŒr weitere Untersuchungen, um die zugrundeliegenden molekularen Mechanismen besser zu verstehen. Die Hypothese wurde aufgestellt, dass die “Seitigkeit” der Koloadenokarzinome auf genomischer und transcriptomischer Ebene rekonstruiert werden könnte. Um die Hypothese zu ĂŒberprĂŒfen wurden Daten aus der TCGA (Tumor Cancer Genome Atlas) Datenbank in einer Fall-Kontroll-Studie verwendet, um Expres- sionsprofile mittels Machine-Learning-Algorithmen zu erarbeiten, welche die Tumorlokalisation vorhersagen können. Die Algorithmen identifizierten sowohl PRAC1, HOXB13, HOXC9, HOXC6, HOTAIR, PRAC2, und HOXC8 (alles Mitglieder der Homeobox Genfamilie) als auch BST2, PLTP, FN1, ITLN1, und AREG als Prediktoren der Tumorlokalisation. Diese Ergebnisse bestĂ€ti- gen bereits publizierte Erkenntnisse und bekrĂ€ftigen somit die Genauigkeit der Machine-Learning-Algorithmen wie sie hier implementiert wurden. Als zusĂ€tzlicher Nutzen ist der Workflow fĂŒr die Erarbeitung der genomischen und transcriptomischen Profile sehr flexibel und kann fĂŒr weitere Analysen der TCGA Daten verwendet werden

    Computational Methods towards Personalized Cancer Vaccines and their Application through a Web-based Platform

    Get PDF
    Cancer immunotherapy is a treatment option that involves or uses components of a patient’s immune system. Today, it is heading towards becoming an integral part of treatment plans together with chemotherapy, surgery, and radiotherapy. Personalized epitope-based vaccines (EVs) serve as one strategy that is truly personalized. Each patient possesses a distinct immune system, and each tumor is unique, rendering the design of a potent vaccine challenging and dependent on the patient and the tumor. The potency of a vaccine is reliant on the ability of its constituent epitopes – short, immunogenic antigen fragments – to trigger an immune response. To assess this ability, one has to take into account the individuality of the immune system, among others conditioned by the variability of the human leukocyte antigen (HLA) gene cluster. Determining the HLA genotype with traditional experimental techniques can be time- and cost-intensive. We proposed a novel HLA genotyping algorithm based on integer linear programming that is independent of dedicated data generation for the sole purpose of HLA typing. On publicly available next-generation sequencing (NGS) data, our method outperformed previously published approaches. HLA binding is a prerequisite for T-cell recognition, and precise prediction algorithms exist. However, this information is not sufficient to assess the immunogenic potential of a peptide. To induce an immune response, reactive T-cell clones with receptors specific for a peptide-HLA complex have to be present. We suggested a method for the prediction of immunogenicity that includes peripheral tolerance models, based on gut microbiome data, in addition to central tolerance, previously shown to increase performance. The comparison to a previously published method suggests that the incorporation of gut microbiome data and HLA-binding stability estimates do not enhance prediction performance. High-throughput sequencing provides the basis for the design of personalized EVs. Through genome and transcriptome sequencing of tumor and matched non-malignant tissue samples, cancer-specific mutations can be identified, which can be further validated using other technologies such as mass spectrometry (MS). Multi-omics approaches can result in the acquisition of several hundreds of gigabytes of data. Handling and analysis of such data usually require data management solutions and high-performance computing (HPC) infrastructures. We developed the web-based platform qPortal for data-driven biomedical research that allows users to manage and analyze quantitative biological data intuitively. To emphasize the advantages of our data-driven approach with an integrated workflow system, we conducted a comparison to Galaxy. Building on qPortal, we implemented the web-based platform iVacPortal for the design of personalized EVs to facilitate data management and data analysis in such projects. Further, we applied the implemented methods through iVacPortal in two studies of two distinct cancer entities, indicating the added value of our platform for the assessment of personalized EV candidates and alternative targets for cancer immunotherapy

    Low-frequency variant detection in viral populations using massively parallel sequencing data

    Get PDF
    • 

    corecore