934 research outputs found

    Robustness Evaluation for Phylogenetic Reconstruction Methods and Evolutionary Models Reconstruction of Tumor Progression

    Get PDF
    During evolutionary history, genomes evolve by DNA mutation, genome rearrangement, duplication and gene loss events. There has been endless effort to the phylogenetic and ancestral genome inference study. Due to the great development of various technology, the information about genomes is exponentially increasing, which make it possible figure the problem out. The problem has been shown so interesting that a great number of algorithms have been developed rigorously over the past decades in attempts to tackle these problems following different kind of principles. However, difficulties and limits in performance and capacity, and also low consistency largely prevent us from confidently statement that the problem is solved. To know the detailed evolutionary history, we need to infer the phylogeny of the evolutionary history (Big Phylogeny Problem) and also infer the internal nodes information (Small Phylogeny Problem). The work presented in this thesis focuses on assessing methods designed for attacking Small Phylogeny Problem and algorithms and models design for genome evolution history inference from FISH data for cancer data. During the recent decades, a number of evolutionary models and related algorithms have been designed to infer ancestral genome sequences or gene orders. Due to the difficulty of knowing the true scenario of the ancestral genomes, there must be some tools used to test the robustness of the adjacencies found by various methods. When it comes to methods for Big Phylogeny Problem, to test the confidence rate of the inferred branches, previous work has tested bootstrapping, jackknifing, and isolating and found them good resampling tools to corresponding phylogenetic inference methods. However, till now there is still no system work done to try and tackle this problem for small phylogeny. We tested the earlier resampling schemes and a new method inversion on different ancestral genome reconstruction methods and showed different resampling methods are appropriate for their corresponding methods. Cancer is famous for its heterogeneity, which is developed by an evolutionary process driven by mutations in tumor cells. Rapid, simultaneous linear and branching evolution has been observed and analyzed by earlier research. Such process can be modeled by a phylogenetic tree using different methods. Previous phylogenetic research used various kinds of dataset, such as FISH data, genome sequence, and gene order. FISH data is quite clean for the reason that it comes form single cells and shown to be enough to infer evolutionary process for cancer development. RSMT was shown to be a good model for phylogenetic analysis by using FISH cell count pattern data, but it need efficient heuristics because it is a NP-hard problem. To attack this problem, we proposed an iterative approach to approximate solutions to the steiner tree in the small phylogeny tree. It is shown to give better results comparing to earlier method on both real and simulation data. In this thesis, we continued the investigation on designing new method to better approximate evolutionary process of tumor and applying our method to other kinds of data such as information using high-throughput technology. Our thesis work can be divided into two parts. First, we designed new algorithms which can give the same parsimony tree as exact method in most situation and modified it to be a general phylogeny building tool. Second, we applied our methods to different kinds data such as copy number variation information inferred form next generation sequencing technology and predict key changes during evolution

    Analysis of gene copy number changes in tumor phylogenetics

    Get PDF

    Phylogenetic Reconstruction Analysis on Gene Order and Copy Number Variation

    Get PDF
    Genome rearrangement is known as one of the main evolutionary mechanisms on the genomic level. Phylogenetic analysis based on rearrangement played a crucial role in biological research in the past decades, especially with the increasing avail- ability of fully sequenced genomes. In general, phylogenetic analysis aims to solve two problems: Small Parsimony Problem (SPP) and Big Parsimony Problem (BPP). Maximum parsimony is a popular approach for SPP and BPP which relies on itera- tively solving a NP-hard problem, the median problem. As a result, current median solvers and phylogenetic inference methods based on the median problem all face se- rious problems on scalability and cannot be applied to datasets with large and distant genomes. In this thesis, we propose a new median solver for gene order data that combines double-cut-join (DCJ) sorting with the Simulated Annealing algorithm (SA- Median). Based on this median solver, we built a new phylogenetic inference method to solve both SPP and BPP problems. Our experimental results show that the new median solver achieves an excellent performance on simulated datasets and the phylo- genetic inference tool built based on the new median solver has a better performance than other existing methods. Cancer is known for its heterogeneity and is regarded as an evolutionary process driven by somatic mutations and clonal expansions. This evolutionary process can be modeled by a phylogenetic tree and phylogenetic analysis of multiple subclones of cancer cells can facilitate the study of the tumor variants progression. Copy-number aberration occurs frequently in many types of tumors in terms of segmental ampli- fications and deletions. In this thesis, we developed a distance-based method for reconstructing phylogenies from copy-number profiles of cancer cells. We demon- strate the importance of distance correction from the edit (minimum) distance to the estimated actual number of events. Experimental results show that our approaches provide accurate and scalable results in estimating the actual number of evolutionary events between copy number profiles and in reconstructing phylogenies. High-throughput sequencing of tumor samples has reported various degrees of ge- netic heterogeneity between primary tumors and their distant subpopulations. The clonal theory of cancer evolution shows that tumor cells are descended from a common origin cell. This origin cell includes an advantageous mutation that cause a clonal expansion with a large amount of population of cells descended from the origin cell. To further investigate cancer progression, phylogenetic analysis on the tumor cells is imperative. In this thesis, we developed a novel approach to infer the phylogeny to analyze both Next-Generation Sequencing and Long-Read Sequencing data. Experi- mental results show that our new proposed method can infer the entire phylogenetic progression very accurately on both Next-Generation Sequencing and Long-Read Se- quencing data. In this thesis, we focused on phylogenetic analysis on both gene order sequence and copy number variations. Our thesis work can be categorized into three parts. First, we developed a new median solver to solve the median problem and phylogeny inference with DCJ model and apply our method to both simulated data and real yeast data. Second, we explored a new approach to infer the phylogeny of copy number profiles for a wide range of parameters (e.g., different number of leaf genomes, different number of positions in the genome, and different tree diameters). Third, we concentrated our work on the phylogeny inference on the high-throughput sequencing data and proposed a novel approach to further investigate and phylogenetic analyze the entire expansion process of cancer cells on both Next-Generation Sequencing and Long-Read Sequencing data

    CNETML: Maximum likelihood inference of phylogeny from copy number profiles of spatio-temporal samples

    Get PDF
    Phylogenetic trees based on copy number alterations (CNAs) for multi-region samples of a single cancer patient are helpful to understand the spatio-temporal evolution of cancers, especially in tumours driven by chromosomal instability. Due to the high cost of deep sequencing data, low-coverage data are more accessible in practice, which only allow the calling of (relative) total copy numbers due to the lower resolution. However, methods to reconstruct sample phylogenies from CNAs often use allele-specific copy numbers and those using total copy number are mostly distance matrix or maximum parsimony methods which do not handle temporal data or estimate mutation rates. In this work, we developed a new maximum likelihood method based on a novel evolutionary model of CNAs, CNETML, to infer phylogenies from spatio-temporal samples taken within a single patient. CNETML is the first program to jointly infer the tree topology, node ages, and mutation rates from total copy numbers when samples were taken at different time points. Our extensive simulations suggest CNETML performed well even on relative copy numbers with subclonal whole genome doubling events and under slight violation of model assumptions. The application of CNETML to real data from Barrett’s esophagus patients also generated consistent results with previous discoveries and novel early CNAs for further investigations

    CNETML: maximum likelihood inference of phylogeny from copy number profiles of multiple samples

    Get PDF
    Phylogenetic trees based on copy number profiles from multiple samples of a patient are helpful to understand cancer evolution. Here, we develop a new maximum likelihood method, CNETML, to infer phylogenies from such data. CNETML is the first program to jointly infer the tree topology, node ages, and mutation rates from total copy numbers of longitudinal samples. Our extensive simulations suggest CNETML performs well on copy numbers relative to ploidy and under slight violation of model assumptions. The application of CNETML to real data generates results consistent with previous discoveries and provides novel early copy number events for further investigation

    From Birds to Drug-Resistant Cancer, a novel In situ Methodology to Explore Divergent Genome Evolution

    Get PDF
    Fluorescent hybridisatio nmethodologies have not changed in principles over the past 30 years, with the increase of computational sequencing technologies causing the replacement of in situ hybridisations. Fluorescence in situ hybridisation (FISH) is in need of a refresh to be a worthwhile tool in a modern day cytogenetic laboratory to overcome short comings of these new methods. The creation of the novel multilayer FISH protocol has effectively eliminated many negative aspects of classic FISH based experiments, such as a large reduction in cost and is no longer as limited by fluorophore availability. Here presented within this thesis is the creation of this methodology and application to a wide variety of cytogenetic hypothesises. Key species from the Galliform order were investigated in order to detect previously missed intrachromosomal rearrangements within their macrochromosomes, a premise formerly overlooked. Rearrangements were found within chromosomes of the galliforme species used such as E.chinensis which displays a intrachromosomal inversion on the p-arm of chromosome 2. Furthermore, the creation of an interphase state folding prediction tool has been used to assess the arrangement of macrochromosomes during cellular growth stages within G.gallus. Here it is noted that there are particular arrangements identified which are similar across chromosomes studied. The chicken lymphoma cell line DT40 is of great importance in B-cell receptor studies along with gene disruption experiments. Presented here is an updated karyotype for the cell line. Here shows contrasting and more in-depth evidence of aberrations to further develop our understanding of the genomic arrangement of this useful cell line. The level of tumour heterogeneity in a cancer is a diagnostic tool allowing clinicians to comment on therapeutic choices and prognosis of the disease. Found to be dominant in recurrent cancers, cytotoxic resistant tumour cell populations may indeed exist within initial primary tumours at low frequency to be positively selected during chemotherapy. Within a neuroblastoma cell line,and cyto-toxic resistant derivatives lines,there has been identified a level of genomic heterogeneity which may give clues towards the generation of drug resistance mechanisms

    Leveraging single cell sequencing to unravel intra-tumour heterogeneity and tumour evolution in human cancers

    Get PDF
    Intra-tumour heterogeneity and tumour evolution are well-documented phenomena in human cancers. While the advent of next-generation sequencing technologies has facilitated the large-scale capture of genomic data, the field of single cell genomics is nascent but rapidly advancing and generating many new insights into the complex molecular mechanisms of tumour biology. In this review, we provide an overview of current single cell DNA sequencing technologies, exploring how recent methodological advancements have enumerated new insights into intra-tumour heterogeneity and tumour evolution. Areas highlighted include the potential power of single cell genome sequencing studies to explore evolutionary dynamics contributing to tumourigenesis through to progression, metastasis and therapy resistance. We also explore the use of in-situ sequencing technologies to study intra-tumour heterogeneity in a spatial context, as well as examining the use of single cell genomics to perform lineage tracing in both normal and malignant tissues. Finally, we consider the use of multi-modal single cell sequencing technologies. Taken together, it is hoped that these many facets of single cell genome sequencing will improve our understanding of tumourigenesis, progression and lethality in cancer leading to the development of novel therapies. This article is protected by copyright. All rights reserved

    Identification of molecular events associated with evolution of multifocal and metastatic urothelial cancer

    Get PDF
    Urothelial cell carcinoma of the bladder is characterized by multifocality. Identification of molecular events associated with multifocal disease will increase understanding of disease pathogenesis, aid development of targeted therapies and ultimately lead to reduced morbidity, mortality and healthcare costs. In the first part of the project, I determined genome-wide copy number alterations and mutations in key genes in synchronous multifocal non-invasive tumours in order to: • Assess monoclonal or oligoclonal origin. • Assess molecular heterogeneity of tumours within one patient as this might imply differential response to treatment. • Identify specific features of multifocality i.e. is multifocal disease different from solitary disease matched for grade and stage. One-way hierarchical cluster analysis of copy number and mutational data of FGFR3, PIK3CA and RAS genes from 66 tumours was performed to assess the relationships between the individual tumours of each patient. Tumours separated into 3 main clusters with tumours from the same patient tending to group together. The majority of tumours from the same patient shared at least a few copy number alterations indicating monoclonal origin. Comparison of multifocal and solitary tumours of the same grade and stage revealed that multifocal tumours exhibited higher frequencies of chromosomal alterations than solitary counterparts. In the second part of the project I used immunohistochemistry on tissue microarrays to assess whether the FGFR3 expression status of a primary bladder tumour can serve as a surrogate for the related metastases. Expression levels in two evaluable tissue spots from the primary tumour (n= 97) or the lymph node metastases (n = 90) showed a high level of concordance (primary tumour: OR=8.6, p=0.000003; metastases: OR=16.7, p=0.0000002). With few exceptions, the levels of FGFR protein expression were the same in matched primary and metastatic lesions (p=0.78), suggesting that expression in the primary tumour can be used to select FGFR-targeted therapy for disseminated disease

    Statistical Methods For Genomic And Transcriptomic Sequencing

    Get PDF
    Part 1: High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but CNV profiling from whole-exome sequencing (WES) is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for WES data. CODEX includes a Poisson latent factor model, which includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based segmentation procedure that explicitly models the count-based WES data. CODEX is compared to existing methods on germline CNV detection in HapMap samples using microarray-based gold standard and is further evaluated on 222 neuroblastoma samples with matched normal, with focus on somatic CNVs within the ATRX gene. Part 2: Cancer is a disease driven by evolutionary selection on somatic genetic and epigenetic alterations. We propose Canopy, a method for inferring the evolutionary phylogeny of a tumor using both somatic copy number alterations and single nucleotide alterations from one or more samples derived from a single patient. Canopy is applied to bulk sequencing datasets of both longitudinal and spatial experimental designs and to a transplantable metastasis model derived from human cancer cell line MDA-MB-231. Canopy successfully identifies cell populations and infers phylogenies that are in concordance with existing knowledge and ground truth. Through simulations, we explore the effects of key parameters on deconvolution accuracy, and compare against existing methods. Part 3: Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing (scRNA-seq) allows the comparison of expression distribution between the two alleles of a diploid organism and thus the characterization of allele-specific bursting. We propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters, and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that, globally, cis control in gene expression overwhelmingly manifests as differences in burst frequency
    • …
    corecore