509 research outputs found

    Inference of Tumor Phylogenies from Genomic Assays on Heterogeneous Samples

    Get PDF
    Tumorigenesis can in principle result from many combinations of mutations, but only a few roughly equivalent sequences of mutations, or “progression pathways,” seem to account for most human tumors. Phylogenetics provides a promising way to identify common progression pathways and markers of those pathways. This approach, however, can be confounded by the high heterogeneity within and between tumors, which makes it difficult to identify conserved progression stages or organize them into robust progression pathways. To tackle this problem, we previously developed methods for inferring progression stages from heterogeneous tumor profiles through computational unmixing. In this paper, we develop a novel pipeline for building trees of tumor evolution from the unmixed tumor data. The pipeline implements a statistical approach for identifying robust progression markers from unmixed tumor data and calling those markers in inferred cell states. The result is a set of phylogenetic characters and their assignments in progression states to which we apply maximum parsimony phylogenetic inference to infer tumor progression pathways. We demonstrate the full pipeline on simulated and real comparative genomic hybridization (CGH) data, validating its effectiveness and making novel predictions of major progression pathways and ancestral cell states in breast cancers

    Robustness Evaluation for Phylogenetic Reconstruction Methods and Evolutionary Models Reconstruction of Tumor Progression

    Get PDF
    During evolutionary history, genomes evolve by DNA mutation, genome rearrangement, duplication and gene loss events. There has been endless effort to the phylogenetic and ancestral genome inference study. Due to the great development of various technology, the information about genomes is exponentially increasing, which make it possible figure the problem out. The problem has been shown so interesting that a great number of algorithms have been developed rigorously over the past decades in attempts to tackle these problems following different kind of principles. However, difficulties and limits in performance and capacity, and also low consistency largely prevent us from confidently statement that the problem is solved. To know the detailed evolutionary history, we need to infer the phylogeny of the evolutionary history (Big Phylogeny Problem) and also infer the internal nodes information (Small Phylogeny Problem). The work presented in this thesis focuses on assessing methods designed for attacking Small Phylogeny Problem and algorithms and models design for genome evolution history inference from FISH data for cancer data. During the recent decades, a number of evolutionary models and related algorithms have been designed to infer ancestral genome sequences or gene orders. Due to the difficulty of knowing the true scenario of the ancestral genomes, there must be some tools used to test the robustness of the adjacencies found by various methods. When it comes to methods for Big Phylogeny Problem, to test the confidence rate of the inferred branches, previous work has tested bootstrapping, jackknifing, and isolating and found them good resampling tools to corresponding phylogenetic inference methods. However, till now there is still no system work done to try and tackle this problem for small phylogeny. We tested the earlier resampling schemes and a new method inversion on different ancestral genome reconstruction methods and showed different resampling methods are appropriate for their corresponding methods. Cancer is famous for its heterogeneity, which is developed by an evolutionary process driven by mutations in tumor cells. Rapid, simultaneous linear and branching evolution has been observed and analyzed by earlier research. Such process can be modeled by a phylogenetic tree using different methods. Previous phylogenetic research used various kinds of dataset, such as FISH data, genome sequence, and gene order. FISH data is quite clean for the reason that it comes form single cells and shown to be enough to infer evolutionary process for cancer development. RSMT was shown to be a good model for phylogenetic analysis by using FISH cell count pattern data, but it need efficient heuristics because it is a NP-hard problem. To attack this problem, we proposed an iterative approach to approximate solutions to the steiner tree in the small phylogeny tree. It is shown to give better results comparing to earlier method on both real and simulation data. In this thesis, we continued the investigation on designing new method to better approximate evolutionary process of tumor and applying our method to other kinds of data such as information using high-throughput technology. Our thesis work can be divided into two parts. First, we designed new algorithms which can give the same parsimony tree as exact method in most situation and modified it to be a general phylogeny building tool. Second, we applied our methods to different kinds data such as copy number variation information inferred form next generation sequencing technology and predict key changes during evolution

    Inferring clonal evolution of tumors from single nucleotide somatic mutations

    Get PDF
    High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. However, automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a partial order plot. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available

    Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

    Full text link
    Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

    The role of chromosomal instability and parallel evolution in cancer

    Get PDF
    Although chromosomal instability (CIN) is recognised as an initiating process in cancer, the extent and relevance of ongoing somatic copy number alterations (SCNAs) that result from it later in tumour development is unclear. In this thesis I describe a comprehensive analysis, including 1421 tumour samples (394 patients; 22 tumour types), to evaluate ongoing CIN and SCNAs in tumour evolution and show that intratumor heterogeneity mediated through chromosomal instability is associated with an increased risk of recurrence or death in non-small cell lung cancer (NSCLC), a finding that supports the potential value of CIN as a prognostic predictor. I also uncover pervasive SCNA intratumour heterogeneity across cancers, with recurrent clonal and subclonal events identified and found to demonstrate enrichment for cancer genes. I develop novel techniques for obtaining a phasing of heterozygous SNPs from multi-region next generation sequencing data and apply them to observe recurrent parallel evolutionary events converging upon disruption to the same genes in distinct subclones within 146 individual tumours. The most prevalent recurrent parallel loss event involved chromosome 14, including HIF1A and HIF1B. In addition, chromosome 5p, including TERT, was recurrently gained and subject to parallel evolution in 7 tumour types. Tumour type-specific constraints to early tumour development were identified in the form of obligatory clonal LOH, including LOH of 3p in clear cell renal cell carcinoma, lung squamous cell carcinoma (LUSC) and triple-negative breast cancer and LOH of 17p in LUSC, colorectal adenocarcinoma, triple negative and HER2+ breast cancer. Wholegenome doubling (WGD) was generally an early event in tumour evolution, associated with an increased acquisition of both clonal and subclonal SCNAs. For instance, CCNE1 amplifications, which occurred exclusively in WGD tumours, were subclonal in 45% of these cases, suggesting this event may be selected following a WGD event. Mathematical modelling of subclonal SCNA evolution demonstrated that models that incorporate ongoing selection with respect to SCNAs significantly outperform evolutionary neutral models, particularly in the context of WGD. This thesis highlights the importance of ongoing CIN and recurrent subclonal chromosomal alterations in tumour evolution, reveals parallel evolution of SCNAs, and sheds light on the dynamics and order of events that influence metastasis

    The Genomic and Immune Landscapes of Lethal Metastatic Breast Cancer

    Get PDF
    TCR repertoire; Breast cancer; Clade mutationsRepertori TCR; Càncer de mama; Mutacions cladeRepertorio TCR; Cáncer de mama; Mutaciones cladoThe detailed molecular characterization of lethal cancers is a prerequisite to understanding resistance to therapy and escape from cancer immunoediting. We performed extensive multi-platform profiling of multi-regional metastases in autopsies from 10 patients with therapy-resistant breast cancer. The integrated genomic and immune landscapes show that metastases propagate and evolve as communities of clones, reveal their predicted neo-antigen landscapes, and show that they can accumulate HLA loss of heterozygosity (LOH). The data further identify variable tumor microenvironments and reveal, through analyses of T cell receptor repertoires, that adaptive immune responses appear to co-evolve with the metastatic genomes. These findings reveal in fine detail the landscapes of lethal metastatic breast cancer

    Inferring the clonal identity of single cells from RNA-seq data with Unique Molecular Identifiers

    Get PDF
    Cancer is an evolutionary disease, in which heterogeneous populations of tumor cells can emerge, proliferate, and disappear depending on selective and neutral processes. This principle has been observed in many studies of acute myeloid leukemia (AML), which is the most common blood cancer in adults. Clonal heterogeneity and evolution have been proposed to play a role in the high relapse rate of this type of cancer. In order to understand this feature, it is crucial to have adequate clinical and experimental models that can provide enough data to elucidate the evolutionary history of a tumor, such as patient-derived xenografts (PDX). These models can be combined with high-resolution sequencing technologies, such as single-cell RNA-seq, to provide a detailed view of the heterogeneity and molecular features of the tumor. However, adequate analytical tools have to be applied and developed in order to fully exploit such datasets. Here I present the analysis of the clonal heterogeneity of an AML patient and the corresponding PDX model, which was treated with multiple rounds of chemotherapy. This model allowed to study the response of the tumor populations to the pressure induced by the therapy, and the possible evolutionary forces behind it. Datasets for these AML samples were generated with multiple types of sequencing methods, one of which was single-cell RNA sequencing. To enable the analysis of somatic mutations and clonal populations in this kind of data, I developed a software package, which is capable of extracting and proofreading variant sequences by making use of Unique Molecular Identifiers (UMIs), which are sequence barcodes that allow to distinguish reads that come from PCR amplification duplicates. The benefits of employing this proofreading approach for variant calling and for inferring the clonal identity of single cells were demonstrated. Finally, I applied to the analysis of the single-cell data of the AML PDX samples that were treated with chemotherapy, as well as other datasets with UMI-based sequencing
    corecore