Search CORE

5 research outputs found

Discovering Subclones and Their Driver Genes in Tumors Sequenced at Standard Depths

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Understanding intratumor heterogeneity and their driver genes is critical to designing personalized treatments and improving clinical outcomes of cancers. Such investigations require accurate delineation of the subclonal composition of a tumor, which to date can only be reliably inferred from deep-sequencing data (>300x depth). The resulting algorithm from the work presented here, incorporates an adaptive error model into statistical decomposition of mixed populations, which corrects the mean-variance dependency of sequencing data at the subclonal level and enables accurate subclonal discovery in tumors sequenced at standard depths (30-50x). Tested on extensive computer simulations and real-world data, this new method, named model-based adaptive grouping of subclones (MAGOS), consistently outperforms existing methods on minimum sequencing depth, decomposition accuracy and computation efficiency. MAGOS supports subclone analysis using single nucleotide variants and copy number variants from one or more samples of an individual tumor. GUST algorithm, on the other hand is a novel method in detecting the cancer type specific driver genes. Combination of MAGOS and GUST results can provide insights into cancer progression. Applications of MAGOS and GUST to whole-exome sequencing data of 33 different cancer types’ samples discovered a significant association between subclonal diversity and their drivers and patient overall survival.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

ASU Digital Repository

Recommended from our members

Bayesian Inference for Genomic Data Analysis

Author: Ogundijo Oyetunji Enoch
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data. Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters. Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples

Columbia University Academic Commons

Predicting clone genotypes from tumor bulk sequencing of multiple samples

Author: Alves
Beerenwinkel
Cancer Genome Atlas Research Network et al.
Davis
de Bruin
Deshwar
El-Kebir
Farahani
Fischer
Gawad
Gerlinger
Gerlinger
Gundem
Helleday
Hong
Horne
Hu
Jiang
Jiao
John Hancock
Karen Gomez
Kuhn
Landau
Louise A Huuki
Macintyre
Malikic
McPherson
Navin
Nei
Nik-Zainal
Ojha
Oscar Murillo
Popic
Reiter
Ross-Innes
Sayaka Miura
Schuh
Sengupta
Sottoriva
Stachler
Sudhir Kumar
Sun
Tiffany Buturla
Tracy Vu
Turajlic
Uchi
Vandin
Yang
Zare
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Investigating intratumour heterogeneity analysis methods and their application in GBM

Author: Tanner Georgette Nicola
Publication venue
Publication date: 01/10/2020
Field of study

Glioblastoma (GBM) is an incurable cancer with a median survival of 15 months. Despite debulking surgery, cancer cells are inevitably left behind in the surrounding brain, with a minority able to resist subsequent chemoradiotherapy and eventually form a recurrent tumour. This resistance is likely influenced by the cells’ genotypes, which show high variability (intratumour heterogeneity), as a result of tumour evolution. Characterising changes in the genetic architecture of tumours through therapy, may allow us to understand the effect that different mutations and pathways have on cell survival, and potentially identify novel targets for counteracting resistance in GBM. Such analyses involve detection of mutations from bulk tumour samples, and then delineating them into individual genetically distinct ‘subclones’, through subclonal deconvolution. This is a complex process, with no reliable guidelines for the best pipelines to use. I therefore developed methods to allow simulation and in silico sequencing of genomes from realistically complex, artificial tumour samples, so that I could benchmark such pipelines. This revealed that no tested pipelines, using single bulk samples, showed a high level of accuracy, though mutation calling with Mutect2 and FACETS, followed by subclonal deconvolution with Ccube, showed the best results. I then used alternative approaches with the largest longitudinal GBM dataset investigated to date. I found that evidence of strong subclonal selection is absent in many samples, and not associated with therapy. Nonetheless, this does not negate the possibility of smaller, or less frequent, pockets of altered fitness. Using pathway analysis combined with variants that are informative of tumour progression, I identified processes that may confer increased resistance, or sensitisation to therapy, and which warrant further investigation. Lastly, I apply subclonal deconvolution to investigate mouse-specific evolution in GBM patient-derived orthotopic xenografts and found no clear evidence to suggest these models are unsuitable for investigations relevant to humans

White Rose E-theses Online