5 research outputs found
Discovering Subclones and Their Driver Genes in Tumors Sequenced at Standard Depths
abstract: Understanding intratumor heterogeneity and their driver genes is critical to
designing personalized treatments and improving clinical outcomes of cancers. Such
investigations require accurate delineation of the subclonal composition of a tumor, which
to date can only be reliably inferred from deep-sequencing data (>300x depth). The
resulting algorithm from the work presented here, incorporates an adaptive error model
into statistical decomposition of mixed populations, which corrects the mean-variance
dependency of sequencing data at the subclonal level and enables accurate subclonal
discovery in tumors sequenced at standard depths (30-50x). Tested on extensive computer
simulations and real-world data, this new method, named model-based adaptive grouping
of subclones (MAGOS), consistently outperforms existing methods on minimum
sequencing depth, decomposition accuracy and computation efficiency. MAGOS supports
subclone analysis using single nucleotide variants and copy number variants from one or
more samples of an individual tumor. GUST algorithm, on the other hand is a novel method
in detecting the cancer type specific driver genes. Combination of MAGOS and GUST
results can provide insights into cancer progression. Applications of MAGOS and GUST
to whole-exome sequencing data of 33 different cancer types’ samples discovered a
significant association between subclonal diversity and their drivers and patient overall
survival.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201
Recommended from our members
Bayesian Inference for Genomic Data Analysis
High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data.
Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters.
Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples
Investigating intratumour heterogeneity analysis methods and their application in GBM
Glioblastoma (GBM) is an incurable cancer with a median survival of 15 months. Despite debulking surgery, cancer cells are inevitably left behind in the surrounding brain, with a minority able to resist subsequent chemoradiotherapy and eventually form a recurrent tumour. This resistance is likely influenced by the cells’ genotypes, which show high variability (intratumour heterogeneity), as a result of tumour evolution. Characterising changes in the genetic architecture of tumours through therapy, may allow us to understand the effect that different mutations and pathways have on cell survival, and potentially identify novel targets for counteracting resistance in GBM. Such analyses involve detection of mutations from bulk tumour samples, and then delineating them into individual genetically distinct ‘subclones’, through subclonal deconvolution. This is a complex process, with no reliable guidelines for the best pipelines to use. I therefore developed methods to allow simulation and in silico sequencing of genomes from realistically complex, artificial tumour samples, so that I could benchmark such pipelines. This revealed that no tested pipelines, using single bulk samples, showed a high level of accuracy, though mutation calling with Mutect2 and FACETS, followed by subclonal deconvolution with Ccube, showed the best results. I then used alternative approaches with the largest longitudinal GBM dataset investigated to date. I found that evidence of strong subclonal selection is absent in many samples, and not associated with therapy. Nonetheless, this does not negate the possibility of smaller, or less frequent, pockets of altered fitness. Using pathway analysis combined with variants that are informative of tumour progression, I identified processes that may confer increased resistance, or sensitisation to therapy, and which warrant further investigation. Lastly, I apply subclonal deconvolution to investigate mouse-specific evolution in GBM patient-derived orthotopic xenografts and found no clear evidence to suggest these models are unsuitable for investigations relevant to humans