445 research outputs found

    Cancer risk prediction with whole exome sequencing and machine learning

    Get PDF
    Accurate cancer risk and survival time prediction are important problems in personalized medicine, where disease diagnosis and prognosis are tuned to individuals based on their genetic material. Cancer risk prediction provides an informed decision about making regular screening that helps to detect disease at the early stage and therefore increases the probability of successful treatments. Cancer risk prediction is a challenging problem. Lifestyle, environment, family history, and genetic predisposition are some factors that influence the disease onset. Cancer risk prediction based on predisposing genetic variants has been studied extensively. Most studies have examined the predictive ability of variants in known mutated genes for specific cancers. However, previous studies have not explored the predictive ability of collective genomic variants from whole-exome sequencing data. It is crucial to train a model in one study and predict another related independent study to ensure that the predictive model generalizes to other datasets. Survival time prediction allows patients and physicians to evaluate the treatment feasibility and helps chart health treatment plans. Many studies have concluded that clinicians are inaccurate and often optimistic in predicting patients’ survival time; therefore, the need increases for automated survival time prediction from genomic and medical imaging data. For cancer risk prediction, this dissertation explores the effectiveness of ranking genomic variants in whole-exome sequencing data with univariate features selection methods on the predictive capability of machine learning classifiers. The dissertation performs cross-study in chronic lymphocytic leukemia, glioma, and kidney cancers that show that the top-ranked variants achieve better accuracy than the whole genomic variants. For survival time prediction, many studies have devised 3D convolutional neural networks (CNNs) to improve the accuracy of structural magnetic resonance imaging (MRI) volumes to classify glioma patients into survival categories. This dissertation proposes a new multi-path convolutional neural network with SNP and demographic features to predict glioblastoma survival groups with a one-year threshold that improves upon existing machine learning methods. The dissertation also proposes a multi-path neural network system to predict glioblastoma survival categories with a 14-year threshold from a heterogeneous combination of genomic variations, messenger ribonucleic acid (RNA) expressions, 3D post-contrast T1 MRI volumes, and 2D post-contrast T1 MRI modality scans that show the malignancy. In 10-fold cross-validation, the mean 10-fold accuracy of the proposed network with handpicked 2D MRI slices (that manifest the tumor), mRNA expressions, and SNPs slightly improves upon each data source individually

    Risk prediction with genomic data

    Get PDF
    Genome wide association study (GWAS) is widely used with various machine learning algorithms to predict disease risk. This thesis investigates this widely used approach of GWAS using Single Nucleotide Polymorphism (SNP) genotype data and a novel approach of disease risk prediction with whole exome sequencing data, namely Whole Exome Wide Association Study (WEWAS). It further applies a discriminating machine learning algorithm, namely a Support Vector Machine (SVM) with different Kernel functions. For this study, only SNPs generated using genotyping technology, which focuses more on common variants, are used initially for disease prediction. Later, the whole exome data generated using Next Generation Sequencing (NSG) technology is used in the prediction. Another distinction between traditional GWAS and the new approach, WEWAS, presented in this thesis is the use of insertions and deletions in the genomic sequence (INDEL) together with SNPs as a feature for prediction. A substantial improvement in the prediction accuracy is achieved using the latter approach. The success of the approach of using NSG data shows that it contains valuable information which the SNP genotyping method is unable to capture

    Chromatin accessibility maps of chronic lymphocytic leukemia identify subtypespecific epigenome signatures and associated transcription regulatory networks

    No full text
    Chronic lymphocytic leukemia (CLL) is characterized by substantial clinical heterogeneity, despite relatively few genetic alterations. To provide a basis for studying epigenome deregulation in CLL, we established genome-wide chromatin accessibility maps for 88 CLL samples from 55 patients using the ATAC-seq assay. These data were further complemented by ChIPmentation and RNA-seq profiling in ten samples. Based on this dataset, we devised and applied a bioinformatic method that links chromatin profiles to clinical annotations. Our analysis identified sample-specific variation on top of a shared core of CLL regulatory regions. IGHV mutation status – which distinguishes the two major subtypes of CLL – was accurately predicted by the chromatin profiles, and gene regulatory networks inferred for IGHV-mutated vs. IGHV-unmutated samples identified characteristic regulatory differences between these two disease subtypes. In summary, we found widespread heterogeneity in the CLL chromatin landscape, established a community resource for studying epigenome deregulation in leukemia, and demonstrated the feasibility of chromatin accessibility mapping in cancer cohorts and clinical research

    Developing an Integrated Genomic Profile for Cancer Patients with the Use of NGS Data

    Get PDF
    Next Generation Sequencing (NGS) technologies has revolutionized genomics data research by facilitating high-throughput sequencing of genetic material that comes from different sources, such as Whole Exome Sequencing (WES) and RNA Sequencing (RNAseq). The exploitation and integration of this wealth of heterogeneous sequencing data remains a major challenge. There is a clear need for approaches that attempt to process and combine the aforementioned sources in order to create an integrated profile of a patient that will allow us to build the complete picture of a disease. This work introduces such an integrated profile using Chronic Lymphocytic Leukemia (CLL) as the exemplary cancer type. The approach described in this paper links the various NGS sources with the patients’ clinical data. The resulting profile efficiently summarizes the large-scale datasets, links the results with the clinical profile of the patient and correlates indicators arising from different data types. With the use of state-of-the-art machine learning techniques and the association of the clinical information with these indicators, which served as the feature pool for the classification, it has been possible to build efficient predictive models. To ensure reproducibility of the results, open data were exclusively used in the classification assessment. The final goal is to design a complete genomic profile of a cancer patient. The profile includes summarization and visualization of the results of WES and RNAseq analysis (specific variants and significantly expressed genes, respectively) and the clinical profile, integration/comparison of these results and a prediction regarding the disease trajectory. Concluding, this work has managed to produce a comprehensive clinico-genetic profile of a patient by successfully integrating heterogeneous data sources. The proposed profile can contribute to the medical research providing new possibilities in personalized medicine and prognostic views

    Development and application of methodologies and infrastructures for cancer genome analysis within Personalized Medicine

    Full text link
    [eng] Next-generation sequencing (NGS) has revolutionized biomedical sciences, especially in the area of cancer. It has nourished genomic research with extensive collections of sequenced genomes that are investigated to untangle the molecular bases of disease, as well as to identify potential targets for the design of new treatments. To exploit all this information, several initiatives have emerged worldwide, among which the Pan-Cancer project of the ICGC (International Cancer Genome Consortium) stands out. This project has jointly analyzed thousands of tumor genomes of different cancer types in order to elucidate the molecular bases of the origin and progression of cancer. To accomplish this task, new emerging technologies, including virtualization systems such as virtual machines or software containers, were used and had to be adapted to various computing centers. The portability of this system to the supercomputing infrastructure of the BSC (Barcelona Supercomputing Center) has been carried out during the first phase of the thesis. In parallel, other projects promote the application of genomics discoveries into the clinics. This is the case of MedPerCan, a national initiative to design a pilot project for the implementation of personalized medicine in oncology in Catalonia. In this context, we have centered our efforts on the methodological side, focusing on the detection and characterization of somatic variants in tumors. This step is a challenging action, due to the heterogeneity of the different methods, and an essential part, as it lays at the basis of all downstream analyses. On top of the methodological section of the thesis, we got into the biological interpretation of the results to study the evolution of chronic lymphocytic leukemia (CLL) in a close collaboration with the group of Dr. ElĂ­as Campo from the Hospital ClĂ­nic/IDIBAPS. In the first study, we have focused on the Richter transformation (RT), a transformation of CLL into a high-grade lymphoma that leads to a very poor prognosis and with unmet clinical needs. We found that RT has greater genomic, epigenomic and transcriptomic complexity than CLL. Its genome may reflect the imprint of therapies that the patients received prior to RT, indicating the presence of cells exposed to these mutagenic treatments which later expand giving rise to the clinical manifestation of the disease. Multiple NGS- based techniques, including whole-genome sequencing and single-cell DNA and RNA sequencing, among others, confirmed the pre-existence of cells with the RT characteristics years before their manifestation, up to the time of CLL diagnosis. The transcriptomic profile of RT is remarkably different from that of CLL. Of particular importance is the overexpression of the OXPHOS pathway, which could be used as a therapeutic vulnerability. Finally, in a second study, the analysis of a case of CLL in a young adult, based on whole genome and single-cell sequencing at different times of the disease, revealed that the founder clone of CLL did not present any somatic driver mutations and was characterized by germline variants in ATM, suggesting its role in the origin of the disease, and highlighting the possible contribution of germline variants or other non-genetic mechanisms in the initiation of CLL

    Advancing immunopeptidomics: validation of the method, improved epitope prediction, peptide-based HLA typing and discrimination of healthy and malignant tissue

    Get PDF
    Seit fast 30 Jahren wird das Immunpeptidom durch Elution von Peptiden aus HLA-Molekülen analysiert. Weltweit nutzen mittlerweile mehrere Institute und Unternehmen diese Methode für ein breites Spektrum an Untersuchungen, die von der simplen Identifizierung von HLA-Peptidmotiven für verschiedene Organismen bis hin zum Nachweis kryptischer krankheitsspezifischer Peptide reichen. Die Immunpeptidomik ist populärer denn je, seit sich die Medikamentenentwicklung in den letzten Jahren auf die positive Modulation des Immunsystems fokussiert hat. Die Zulassung der ersten Checkpoint-Antikörper leitete die Ära der Immuntherapie ein und spezifische Immuntherapien mit weniger Nebenwirkungen stehen nun im Blickpunkt. Das Anwendungsspektrum der Immunpeptidomik ist mittlerweile breit gefächert, dennoch enthält das Immunpeptidom immer noch eine große Fülle von Informationen, die darauf warten, entschlüsselt zu werden. Aktuell ist die Immunpeptidomik darin eingeschränkt, dass die große Anzahl von Peptiden, mit unterschiedlichen Affinitäten und Stabilitäten der Peptid-HLA-Komplexe, nicht optimal erfasst werden kann und daher unter anderem nur begrenzte Wiederfindungsraten möglich sind. Zu Beginn dieser Doktorarbeit gab es ungelöste Fragestellungen auf dem Gebiet der Immunpeptidomik, die in dieser Arbeit untersucht werden sollten: Ist es möglich, die Immunpeptidomik zu validieren und diese zuverlässig für klinische Studien und die Medikamentenentwicklung einzusetzen? Gibt es heute eine zuverlässige Methode zur Identifizierung von Peptidmotiven für Peptid-präsentierende MHC-Klasse-I-Allotypen, dem Grundstein für Epitopvorhersagen und Wirkstoffidentifizierungen? Ist es möglich, Peptide zur Klassifizierung von HLA-Allotypen oder zur Unterscheidung zwischen gesundem und bösartigem Gewebe zu verwenden? Können tumorspezifische Peptide mit dieser Omik-Technologie zuverlässig charakterisiert werden? In dieser Doktorarbeit wurde die immunpeptidomische Methode validiert, um die Zuverlässigkeit der LC-MS/MS-Peptid-Identifizierung zu gewährleisten, und es wurden alle erforderlichen Parameter der Europäischen Arzneimittel-Agentur und U. S. Food and Drug Administration untersucht. Darüber hinaus wurde ein aktualisiertes Protokoll für die Identifizierung von MHC-Liganden, die Entschlüsselung von Peptidmotiven und die Generierung von Matrizen für die Epitopvorhersage erstellt, das sowohl für monoallele Zellen als auch für multiallele Gewebe verwendet werden kann. Schließlich wurde eine Methode entwickelt, um allotypische Peptide zu identifizieren, die eine HLA-Typisierung ermöglichen. Diese Peptide können auch als interner Standard für die semi-quantitative Untersuchung der Tumorspezifität von Peptiden verwendet werden. Diese Methode wurde erfolgreich implementiert, um gewebe- und dignitätsspezifische Muster im Immunpeptidom zu identifizieren und die Dignität von immunpeptidomischen Proben zu bestimmen.For almost 30 years now, the immunopeptidome has been analyzed by eluting peptides from HLA molecules. This method has already been established in several institutes and companies worldwide and is now used for a wide range of investigations from the simple identification of HLA peptide motifs for different organisms to the detection of cryptic disease-specific peptides. The field of immunopeptidomics is more popular than ever as drug development has focused on the positive modulation of the immune system in recent years. Since the approval of the first checkpoint antibodies, the era of immunotherapy has been running and specific immunotherapies with fewer side effects are in the focus. There is a wide range of applications, yet, the immunopeptidome still contains a great wealth of information waiting to be deciphered. Currently, immunopeptidomics is limited in the identification of the large number of peptides with different affinities and stabilities of the peptide-HLA complexes. Therefore, amongst many other factors, only limited recovery rates are possible. When this doctoral thesis started, there were several unresolved questions in the field of immunopeptidomics that should be approached in this thesis: Is it possible to validate immunopeptidomics and use it reliably for clinical studies and drug development? Is there nowadays a reliable method to identify the peptide motif for peptide presenting MHC class I allotypes, the cornerstone for epitope predictions or active substance identification? Is it possible to use peptides to classify HLA allotypes or differentiate between healthy and malignant tissue? Can tumor-specific peptides be reliably characterized with this omic technology? In this doctoral thesis the immunopeptidomic method was validated to ensure the reliability of LC-MS/MS peptide identification and all required parameters of the European Medicines Agency (EMA) and Food and Drug Administration (FDA) were investigated. In addition, an updated protocol for the identification of MHC ligands, deconvolution of peptide motifs and generation of matrices for epitope prediction was established, which can be used for monoallelic cells as well as multiallelic tissue. Finally, a method was developed to identify allotypic peptides that allow HLA typing. These peptides can also be used as an internal standard for semi-quantitative investigation of the tumor specificity of peptides. The developed method was also successfully implemented to identify tissue and dignity specific patterns in the immunopeptidome and to determine the dignity of immunopeptidomic samples

    Development and application of methodologies and infrastructures for cancer genome analysis within Personalized Medicine

    Get PDF
    Programa de Doctorat en Biomedicina / Tesi realitzada al Barcelona Supercomputing Cener (BSC)[eng] Next-generation sequencing (NGS) has revolutionized biomedical sciences, especially in the area of cancer. It has nourished genomic research with extensive collections of sequenced genomes that are investigated to untangle the molecular bases of disease, as well as to identify potential targets for the design of new treatments. To exploit all this information, several initiatives have emerged worldwide, among which the Pan-Cancer project of the ICGC (International Cancer Genome Consortium) stands out. This project has jointly analyzed thousands of tumor genomes of different cancer types in order to elucidate the molecular bases of the origin and progression of cancer. To accomplish this task, new emerging technologies, including virtualization systems such as virtual machines or software containers, were used and had to be adapted to various computing centers. The portability of this system to the supercomputing infrastructure of the BSC (Barcelona Supercomputing Center) has been carried out during the first phase of the thesis. In parallel, other projects promote the application of genomics discoveries into the clinics. This is the case of MedPerCan, a national initiative to design a pilot project for the implementation of personalized medicine in oncology in Catalonia. In this context, we have centered our efforts on the methodological side, focusing on the detection and characterization of somatic variants in tumors. This step is a challenging action, due to the heterogeneity of the different methods, and an essential part, as it lays at the basis of all downstream analyses. On top of the methodological section of the thesis, we got into the biological interpretation of the results to study the evolution of chronic lymphocytic leukemia (CLL) in a close collaboration with the group of Dr. ElĂ­as Campo from the Hospital ClĂ­nic/IDIBAPS. In the first study, we have focused on the Richter transformation (RT), a transformation of CLL into a high-grade lymphoma that leads to a very poor prognosis and with unmet clinical needs. We found that RT has greater genomic, epigenomic and transcriptomic complexity than CLL. Its genome may reflect the imprint of therapies that the patients received prior to RT, indicating the presence of cells exposed to these mutagenic treatments which later expand giving rise to the clinical manifestation of the disease. Multiple NGS- based techniques, including whole-genome sequencing and single-cell DNA and RNA sequencing, among others, confirmed the pre-existence of cells with the RT characteristics years before their manifestation, up to the time of CLL diagnosis. The transcriptomic profile of RT is remarkably different from that of CLL. Of particular importance is the overexpression of the OXPHOS pathway, which could be used as a therapeutic vulnerability. Finally, in a second study, the analysis of a case of CLL in a young adult, based on whole genome and single-cell sequencing at different times of the disease, revealed that the founder clone of CLL did not present any somatic driver mutations and was characterized by germline variants in ATM, suggesting its role in the origin of the disease, and highlighting the possible contribution of germline variants or other non-genetic mechanisms in the initiation of CLL

    ADAPTIVE IMMUNITY AND THE TUMOR IMMUNE MICROENVIRONMENT

    Get PDF
    The adaptive immune system is essential for production of anti-tumor immune responses, with the majority of current immunotherapeutics designed to modulate the interaction between adaptive immunity and tumor cells within the tumor-immune microenvironment. This dissertation addresses three translational goals regarding our understanding and modulation of anti-tumor adaptive immunity: 1) Improvement of understanding for existing immunotherapies such as checkpoint inhibitor therapy (Chapter 2.1); 2) Improvement of efficacy for novel immunotherapeutics currently in development including tumor neoantigen vaccines (Chapter 4); and 3) Development of next-generation immunotherapies through identification of novel anti-tumor vaccine targets (Chapter 3), as well as development of diagnostic tools including biomarkers of immunotherapy response (Chapter 3) and immune-imaging modalities (Chapter 2.1).Doctor of Philosoph
    • …
    corecore