45 research outputs found

    Genomic characterization of malignant progression in neoplastic pancreatic cysts

    Get PDF
    Intraductal papillary mucinous neoplasms (IPMNs) and mucinous cystic neoplasms (MCNs) are non-invasive neoplasms that are often observed in association with invasive pancreatic cancers, but their origins and evolutionary relationships are poorly understood. In this study, we analyze 148 samples from IPMNs, MCNs, and small associated invasive carcinomas from 18 patients using whole exome or targeted sequencing. Using evolutionary analyses, we establish that both IPMNs and MCNs are direct precursors to pancreatic cancer. Mutations in SMAD4 and TGFBR2 are frequently restricted to invasive carcinoma, while RNF43 alterations are largely in non-invasive lesions. Genomic analyses suggest an average window of over three years between the development of high-grade dysplasia and pancreatic cancer. Taken together, these data establish non-invasive IPMNs and MCNs as origins of invasive pancreatic cancer, identifying potential drivers of invasion, highlighting the complex clonal dynamics prior to malignant transformation, and providing opportunities for early detection and intervention

    Computational Approaches to Reconstruction of Cancer Evolutionary Histories from Multi-region Next Generation Sequencing Data

    No full text
    Clonal evolution model of cancer provides a conceptual framework to interpret spatial and temporal intra-tumor heterogeneity, and has survived several decades of experimental validation. The increasing accuracy and decreasing cost of next generation sequencing technology over the past decade has enabled high-throughput profiling of cancer genome and characterization of intra-tumor heterogeneity on genetic level. Multi-region sequencing approach which involves comparative analysis of a collection of longitudinal or spatial tumor samples has motivated a large body of investigation to quantify intra-tumor heterogeneity and reconstruction of cancer evolutionary histories. The data sets resulting from this approach typically contain the status of individual tumor samples with respect to the array somatic alterations identified in a patient. Reconstruction of cancer evolutionary histories from these data sets often involves rigorous quantitative modeling and has been subject of active investigation over the past few years. Here, I introduce a computational method for inference of subclonal hierarchies from somatic mutations (SCHISM). The method evaluates temporal order of individual mutations using a generalized likelihood ratio statistical hypothesis test. Next, it uses a genetic algorithm heuristic search to identify cancer phylogenies with highest level of support. I evaluate the performance of this method using extensive simulation experiments, and by re-analysis of data from a set of recent multi-region sequencing studies where cancer phylogenies are generated by expert manual curation. In the following chapters, I describe application of SCHISM to two studies of cancer clonal evolution. First, I present a multi-region sequencing study of primary pancreatic ductal adenocarcinomas in a cohort of 12 genetically engineered mouse models, which constitutes the first comprehensive genomic characterization these tumors. In addition to identifying somatic alterations in biological processes of established significance in human pancreatic cancer and a candidate recurrent focal deletion, the evolutionary analyses suggest an ongoing process of somatic evolution. The collection of these observations extend the potential utility of two established pancreatic cancer models to study pancreatic cancer evolution in a controlled microenvironment. Second, I apply SCHISM to analyze tumor samples from 5 patients with high grade serous ovarian carcinoma to elucidate the cellular origin of ovarian cancer. In addition to small scale somatic mutation, large scale structural alterations are identified in individual tumor samples, and included in the analysis. The resulting evolutionary models suggest presence of the ancestral clone of ovarian cancer in serous tubal intra-epithelial neoplasia lesion dissected from the fallopian tube, and support the notion that ovarian cancer is in fact a disease of the fallopian tubes

    SubClonal Hierarchy Inference from Somatic Mutations: Automatic Reconstruction of Cancer Evolutionary Trees from Multi-region Next Generation Sequencing

    No full text
    <div><p>Recent improvements in next-generation sequencing of tumor samples and the ability to identify somatic mutations at low allelic fractions have opened the way for new approaches to model the evolution of individual cancers. The power and utility of these models is increased when tumor samples from multiple sites are sequenced. Temporal ordering of the samples may provide insight into the etiology of both primary and metastatic lesions and rationalizations for tumor recurrence and therapeutic failures. Additional insights may be provided by temporal ordering of evolving subclones—cellular subpopulations with unique mutational profiles. Current methods for subclone hierarchy inference tightly couple the problem of temporal ordering with that of estimating the fraction of cancer cells harboring each mutation. We present a new framework that includes a rigorous statistical hypothesis test and a collection of tools that make it possible to decouple these problems, which we believe will enable substantial progress in the field of subclone hierarchy inference. The methods presented here can be flexibly combined with methods developed by others addressing either of these problems. We provide tools to interpret hypothesis test results, which inform phylogenetic tree construction, and we introduce the first genetic algorithm designed for this purpose. The utility of our framework is systematically demonstrated in simulations. For most tested combinations of tumor purity, sequencing coverage, and tree complexity, good power (≥ 0.8) can be achieved and Type 1 error is well controlled when at least three tumor samples are available from a patient. Using data from three published multi-region tumor sequencing studies of (murine) small cell lung cancer, acute myeloid leukemia, and chronic lymphocytic leukemia, in which the authors reconstructed subclonal phylogenetic trees by manual expert curation, we show how different configurations of our tools can identify either a single tree in agreement with the authors, or a small set of trees, which include the authors’ preferred tree. Our results have implications for improved modeling of tumor evolution and the importance of multi-region tumor sequencing.</p></div

    Crossover operation.

    No full text
    <p>A reproductive crossover operation involving a pair of parental trees is used to generate diversity among toplogies in members of each generation produced by the genetic algorithm.</p

    Power and Type 1 error of hypothesis test on simulated data.

    No full text
    <p>For each combination of coverage and purity, results are shown for trees with node counts from three to eight. Each curve was computed by taking the mean over all <i>instances</i> and all replicates for each node count. Curves with circular marks show power and curves with triangular marks show Type 1 error. Transparent coloring indicates ±1<i>SE</i>. Dotted line indicates power = 0.8.</p

    Overview of SCHISM framework.

    No full text
    <p>The framework decouples estimation of somatic mutation cellularities and reconstruction of subclone phylogenies. Given somatic mutation read counts from next generation sequencing data and somatic copy number calls if available, any tools for mutation cellularity estimation and mutation clustering can be applied. Their output is used to estimate the statistical support for temporal ordering of mutation or mutation cluster pairs, using a generalized likelihood ratio test (GLRT). Other approaches to tree reconstruction can be applied, by using the fitness function as the objective for optimization. GA = genetic algorithm, WGS = whole genome sequencing, WES = whole exome sequencing, DS = (targeted) deep sequencing. KDE = kernel density estimation. POV = precedence order violation.</p

    Performance of the genetic algorithm evaluated in two stages.

    No full text
    <p>Stage 1: Fraction of simulation runs where the genetic algorithm’s fitness function identified either a single maximum fitness tree (A1) or two maximum fitness trees (B1). Stage 2: Given success in stage 1, fraction of simulation runs where the correct tree was either the single maximum fitness tree (A2) or one of the top two maximum fitness trees (B2). For each combination of coverage and purity, results are shown for trees with node counts from three to eight. Simulations where (sample count) ≥ (node count) are marked by a double circle.</p

    Reconstruction of subclonal phylogenies in murine models of SCLC.

    No full text
    <p><b>A. Animal 3588.</b> SCHISM identified a single maximum fitness 8-node tree using one primary and two metastatic tumors. <b>B. Animal 3151.</b> Six maximum fitness 9-node trees were identified using one primary and two metastatic tumors. <b>C. Animal 984.</b> Six maximum fitness 7-node trees were identified using one primary and one metastatic tumor. Solid arrows represent lineage relationships shared by all six trees and dashed arrows represent lineage relationships shared by only a subset of the trees. Each arrow is labeled with the fraction of maximum fitness trees that include the lineage relationship. Highlighted arrows indicate the tree manually constructed by the study authors. GL = germline state. Cluster precedence order violation (CPOV) matrices are shown to the left of each tree. Columns and rows represent subclones (or Clones in the terminology of [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004416#pcbi.1004416.ref013" target="_blank">13</a>]). Each red square represents a pair of subclones (I,J) for which the null hypothesis that I could be the parent of J was rejected. Each blue square represents a pair for which the null hypothesis could not be rejected.</p

    MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures

    No full text
    Mutation position imaging toolbox (MuPIT) interactive is a browser-based application for single-nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional (3D) protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts either in bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs, including available functional annotations such as binding sites, mutagenesis experiments, and common polymorphisms. Multiple SNVs may be mapped onto each structure, enabling 3D visualization of SNV clusters and their relationship to functionally annotated positions. We illustrate the utility of MuPIT interactive in rationalizing the impact of selected polymorphisms in the PharmGKB database, somatic mutations identified in the Cancer Genome Atlas study of invasive breast carcinomas, and rare variants identified in the exome sequencing project. MuPIT interactive is freely available for non-profit use at http://mupit.icm.jhu.edu
    corecore