345 research outputs found

    Genealogy Reconstruction: Methods and applications in cancer and wild populations

    Get PDF
    Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding. In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors. In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v Acknowledgments vii 1 Introduction 1 2 Cancer Phylogenies 7 2.1 Introduction..................................... 7 2.2 Background..................................... 9 2.2.1 PhylogeneticTrees............................. 9 2.2.2 Microarrays................................. 10 2.3 Methods....................................... 11 2.3.1 Datasetcompilation ............................ 11 2.3.2 Statistical Methods and Analysis..................... 13 2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15 2.4 Results........................................ 16 2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16 2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28 2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30 2.5 Discussion...................................... 32 3 Wild Pedigrees 35 3.1 Introduction..................................... 35 3.2 The molecular ecologist’s tools of the trade ................... 36 3.2.1 3.2.2 3.2.3 3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37 3.2.2 Parentage and paternity inference .................... 39 3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40 3.3 Background..................................... 40 3.3.1 Pedigrees .................................. 40 3.3.2 Genotypes.................................. 41 3.3.3 Mendelian segregation probability .................... 41 3.3.4 LOD Scores................................. 43 3.3.5 Genotyping Errors ............................. 43 3.3.6 IBD coefficients............................... 45 3.3.7 Bayesian MCMC.............................. 46 3.4 Methods....................................... 47 3.4.1 Likelihood Model.............................. 47 3.4.2 Efficient Likelihood Calculation...................... 49 3.4.3 Maximum Likelihood Pedigree ...................... 51 3.4.4 Full siblings................................. 52 3.4.5 Algorithm.................................. 53 3.4.6 Missing Values ............................... 56 3.4.7 Allelefrequencies.............................. 58 3.4.8 Rates of Self-fertilization.......................... 60 3.4.9 Rates of Clonality ............................. 60 3.5 Results........................................ 61 3.5.1 Real Microsatellite Data.......................... 61 3.5.2 Simulated Human Population....................... 62 3.5.3 SimulatedClonalPlantPopulation.................... 64 3.6 Discussion...................................... 71 4 Conclusions 77 A FRANz 79 A.1 Availability ..................................... 79 A.2 Input files...................................... 79 A.2.1 Maininputfile ............................... 79 A.2.2 Knownrelationships ............................ 80 A.2.3 Allele frequencies.............................. 81 A.2.4 Sampling locations............................. 82 A.3 Output files..................................... 83 A.4 Web 2.0 Interface.................................. 86 List of Figures 87 List of Tables 88 List Abbreviations 90 Bibliography 92 Curriculum Vitae

    Fast and scalable inference of multi-sample cancer lineages.

    Get PDF
    Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee

    Clonality and timing of relapsing colorectal cancer metastasis revealed through whole-genome single-cell sequencing

    Get PDF
    Financiado para publicación en acceso aberto: Universidade de Vigo/CISUGRecurrence of tumor cells following local and systemic therapy is a significant hurdle in cancer. Most patients with metastatic colorectal cancer (mCRC) will relapse, despite resection of the metastatic lesions. A better understanding of the evolutionary history of recurrent lesions is required to identify the spatial and temporal patterns of metastatic progression and expose the genetic and evolutionary determinants of therapeutic resistance. With this goal in mind, here we leveraged a unique single-cell whole-genome sequencing dataset from recurrent hepatic lesions of an mCRC patient. Our phylogenetic analysis confirms that the treatment induced a severe demographic bottleneck in the liver metastasis but also that a previously diverged lineage survived this surgery, possibly after migration to a different site in the liver. This lineage evolved very slowly for two years under adjuvant drug therapy and diversified again in a very short period. We identified several non-silent mutations specific to this lineage and inferred a substantial contribution of chemotherapy to the overall, genome-wide mutational burden. All in all, our study suggests that mCRC subclones can migrate locally and evade resection, keep evolving despite rounds of chemotherapy, and re-expand explosively.Ministerio de Ciencia e Innovación | Ref. PID2019-106247GB-I00AXA Research FundAsociación Española Contra el CáncerXunta de Galicia | Ref. ED481A-2018/30

    Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing

    Get PDF
    Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells

    How many samples are needed to infer truly clonal mutations from heterogenous tumours?

    Get PDF
    BACKGROUND: Modern cancer treatment strategies aim to target tumour specific genetic (or epigenetic) alterations. Treatment response improves if these alterations are clonal, i.e. present in all cancer cells within tumours. However, the identification of truly clonal alterations is impaired by the tremendous intra-tumour genetic heterogeneity and unavoidable sampling biases. METHODS: Here, we investigate the underlying causes of these spatial sampling biases and how the distribution and sizes of biopsies in sampling protocols can be optimised to minimize such biases. RESULTS: We find that in the ideal case, less than a handful of samples can be enough to infer truly clonal mutations. The frequency of the largest sub-clone at diagnosis is the main factor determining the accuracy of truncal mutation estimation in structured tumours. If the first sub-clone is dominating the tumour, higher spatial dispersion of samples and larger sample size can increase the accuracy of the estimation. In such an improved sampling scheme, fewer samples will enable the detection of truly clonal alterations with the same probability. CONCLUSIONS: Taking spatial tumour structure into account will decrease the probability to misclassify a sub-clonal mutation as clonal and promises better informed treatment decisions

    The role of chromosomal instability and parallel evolution in cancer

    Get PDF
    Although chromosomal instability (CIN) is recognised as an initiating process in cancer, the extent and relevance of ongoing somatic copy number alterations (SCNAs) that result from it later in tumour development is unclear. In this thesis I describe a comprehensive analysis, including 1421 tumour samples (394 patients; 22 tumour types), to evaluate ongoing CIN and SCNAs in tumour evolution and show that intratumor heterogeneity mediated through chromosomal instability is associated with an increased risk of recurrence or death in non-small cell lung cancer (NSCLC), a finding that supports the potential value of CIN as a prognostic predictor. I also uncover pervasive SCNA intratumour heterogeneity across cancers, with recurrent clonal and subclonal events identified and found to demonstrate enrichment for cancer genes. I develop novel techniques for obtaining a phasing of heterozygous SNPs from multi-region next generation sequencing data and apply them to observe recurrent parallel evolutionary events converging upon disruption to the same genes in distinct subclones within 146 individual tumours. The most prevalent recurrent parallel loss event involved chromosome 14, including HIF1A and HIF1B. In addition, chromosome 5p, including TERT, was recurrently gained and subject to parallel evolution in 7 tumour types. Tumour type-specific constraints to early tumour development were identified in the form of obligatory clonal LOH, including LOH of 3p in clear cell renal cell carcinoma, lung squamous cell carcinoma (LUSC) and triple-negative breast cancer and LOH of 17p in LUSC, colorectal adenocarcinoma, triple negative and HER2+ breast cancer. Wholegenome doubling (WGD) was generally an early event in tumour evolution, associated with an increased acquisition of both clonal and subclonal SCNAs. For instance, CCNE1 amplifications, which occurred exclusively in WGD tumours, were subclonal in 45% of these cases, suggesting this event may be selected following a WGD event. Mathematical modelling of subclonal SCNA evolution demonstrated that models that incorporate ongoing selection with respect to SCNAs significantly outperform evolutionary neutral models, particularly in the context of WGD. This thesis highlights the importance of ongoing CIN and recurrent subclonal chromosomal alterations in tumour evolution, reveals parallel evolution of SCNAs, and sheds light on the dynamics and order of events that influence metastasis

    Evaluation of simulation methods for tumor subclonal reconstruction

    Full text link
    Most neoplastic tumors originate from a single cell, and their evolution can be genetically traced through lineages characterized by common alterations such as small somatic mutations (SSMs), copy number alterations (CNAs), structural variants (SVs), and aneuploidies. Due to the complexity of these alterations in most tumors and the errors introduced by sequencing protocols and calling algorithms, tumor subclonal reconstruction algorithms are necessary to recapitulate the DNA sequence composition and tumor evolution in silico. With a growing number of these algorithms available, there is a pressing need for consistent and comprehensive benchmarking, which relies on realistic tumor sequencing generated by simulation tools. Here, we examine the current simulation methods, identifying their strengths and weaknesses, and provide recommendations for their improvement. Our review also explores potential new directions for research in this area. This work aims to serve as a resource for understanding and enhancing tumor genomic simulations, contributing to the advancement of the field

    GENOMIC AND TRANSCRIPTOMIC LANDSCAPE OF COLORECTAL PREMALIGNANCY

    Get PDF
    Colorectal cancer (CRC) is the third most commonly diagnosed cancer among men and women in the United States, with 3 to 5 percent of the cases diagnosed in the background of a hereditary form of the disease. Biologically, CRC is divided into two groups: microsatellite instable (MSI) and chromosomally unstable (CIN). Genomic and transcriptomic characterization of CRC has emerged from large-scale studies in recent years due to the advancement of next-generation sequencing technologies. These studies have identified key genes and pathways altered in CRC and provided insights to the discovery of therapeutic targets. Despite the wealth of knowledge acquired in the carcinoma stage, there have been insufficient efforts to systematically characterize premalignant lesions at the molecular level, which could lead to a better understanding of neoplastic initiation, risk prediction, and the development of targeted chemoprevention strategies. The challenge in characterizing premalignancy has always been the limited availability of sample material. This challenge is tackled by getting more samples, integrating public datasets, deploying better technology that use less amount of nucleic acids and in-silico tools to extract multi-layer information from the same experiment. My genomic study consisted of whole exome sequencing (WES) and high-depth targeted sequencing on 80 premalignant lesions bulk tissue and crypts to assess clonality and mutational heterogeneity. WES results showed the presence of multiple clone in premalignancy based on clustering somatic mutation allele frequency. In addition, I determined that multiple clones originate from independent crypts harboring distinct APC and KRAS alterations. In my second study, I performed immune expression profiling and assessment of mutation and neoantigen rate of 28 premalignant lesions with DNA mismatch repair (MMR) deficient and proficient background using RNAseq. My results showed an activated immune profile despite low mutational and neoantigen rate, which challenges the canonical view in MMR-deficient carcinoma stage that immune activation is largely due to high mutation and neoantigen rate. In the last study, I performed transcriptomic sub-classifications of 398 premalignant lesions that associate them with different carcinomas subtypes, and clinical and histopathological features. My results revealed two major findings: prominent immune activation and WNT and MYC activation in premalignancy. In summary, my large-scale genomic and transcriptomic analyses of colorectal adenomas have identified key molecular characteristics in early colorectal tumorigenesis and provide a foundation for discovering novel preventive strategies

    Inferring structural variant cancer cell fraction.

    Get PDF
    We present SVclone, a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole-genome sequencing data. SVclone accurately determines the variant allele frequencies of both SV breakends, then simultaneously estimates the cancer cell fraction and SV copy number. We assess performance using in silico mixtures of real samples, at known proportions, created from two clonal metastases from the same patient. We find that SVclone's performance is comparable to single-nucleotide variant-based methods, despite having an order of magnitude fewer data points. As part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we use SVclone to reveal a subset of liver, ovarian and pancreatic cancers with subclonally enriched copy-number neutral rearrangements that show decreased overall survival. SVclone enables improved characterisation of SV intra-tumour heterogeneity
    • …
    corecore