2,104 research outputs found

    A decision-theoretic approach for segmental classification

    Full text link
    This paper is concerned with statistical methods for the segmental classification of linear sequence data where the task is to segment and classify the data according to an underlying hidden discrete state sequence. Such analysis is commonplace in the empirical sciences including genomics, finance and speech processing. In particular, we are interested in answering the following question: given data yy and a statistical model π(x,y)\pi(x,y) of the hidden states xx, what should we report as the prediction x^\hat{x} under the posterior distribution π(x∣y)\pi (x|y)? That is, how should you make a prediction of the underlying states? We demonstrate that traditional approaches such as reporting the most probable state sequence or most probable set of marginal predictions can give undesirable classification artefacts and offer limited control over the properties of the prediction. We propose a decision theoretic approach using a novel class of Markov loss functions and report x^\hat{x} via the principle of minimum expected loss (maximum expected utility). We demonstrate that the sequence of minimum expected loss under the Markov loss function can be enumerated exactly using dynamic programming methods and that it offers flexibility and performance improvements over existing techniques. The result is generic and applicable to any probabilistic model on a sequence, such as Hidden Markov models, change point or product partition models.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS657 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies.

    Get PDF
    Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogenyBitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.KY and FM would like to acknowledge the support of the University of Cambridge, Cancer Research UK and Hutchison Whampoa Limited.This is the final published version. It first appeared at http://genomebiology.com/2015/16/1/36

    Charting genomic heterogeneity in tumours : from bulk to single cell

    Get PDF
    Tumours do not consist of a single homogeneous population but are complex heterogeneous systems that contain billions of ever-evolving cells with no two tumours being the same. Tumour heterogeneity is present at three levels, 1) inter-patient heterogeneity; 2) intra-patient heterogeneity; and 3) intra-tumour heterogeneity (ITH). Understanding all levels of heterogeneity is crucial for patient prognosis and treatment choice. To this end, we aimed to improve our understanding of all three levels of tumour heterogeneity. In paper I we investigated the prevalence, type, length, and genomic distribution of 853.218 somatic copy number alterations (SCNAs) across 20.249 tumours belonging to 32 cancer types. Based on the 1) number of SCNAs; 2) percentage of the genome altered; and 3) average SCNA size, we found high levels of inter-patient heterogeneity, both between and within cancer types. We found that specific chromosomes were preferentially lost or gained depending on cancer type. Lastly, we detected co-alterations of key oncogenes and TSGs. Taken together, we provided a comprehensive analysis on SCNAs across many cancer types as a valuable resource for the community. In paper II we sought to elucidate intra-patient heterogeneity in non-small cell lung cancer (NSCLC) and their matched brain metastasis (BM). We performed shallow wholegenome sequencing (WGS) on 51 primary NSCLC and matched BM, whole exome sequencing on 40 of the pairs, multi-region sequencing of 15 BMs, and shallow WGS on an additional cohort of 115 BMs. We showed that there is significant intra-patient heterogeneity at the SCNA level, with BM samples showing, on average, more SCNAs compared to their matched NSCLC. In contrast, multi-region sequencing of 15 BMs did not show significant ITH at the level of SCNAs. Finally, we identified putative metastatic driver SCNAs and singlenucleotide variants in key tumour suppressor genes (TSGs) and oncogenes. In paper III we aimed to assess the level of ITH in early localized prostate cancer. We performed organ-wide, multi-region, single-cell DNA sequencing on two prostate midsections. We found transient chromosomal instability (CIN) both in tumour and normal prostate tissue, evidenced by a large number of cells with unique chromosomal (arm) losses and or gains. Furthermore, we found three distinct groups of cells within the prostate: 1) diploid cells; 2) pseudo-diploid cells; and 3) monster cells. We observed an enrichment of diploid cells in normal regions and pseudo-diploid cells in tumour-rich regions, while monster cells were equally distributed over the entire prostate, again suggesting that there were elevated CIN levels across the prostate. Lastly, we detected highly localized subclones that were exclusive to tumour-rich regions and harboured deletions in TSGs that are known to be frequently deleted in prostate cancer. Taken together, with this thesis, I have contributed to advance the understanding of inter-patient, intra-patient, and intra-tumour heterogeneity

    Inferring structural variant cancer cell fraction.

    Get PDF
    We present SVclone, a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole-genome sequencing data. SVclone accurately determines the variant allele frequencies of both SV breakends, then simultaneously estimates the cancer cell fraction and SV copy number. We assess performance using in silico mixtures of real samples, at known proportions, created from two clonal metastases from the same patient. We find that SVclone's performance is comparable to single-nucleotide variant-based methods, despite having an order of magnitude fewer data points. As part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we use SVclone to reveal a subset of liver, ovarian and pancreatic cancers with subclonally enriched copy-number neutral rearrangements that show decreased overall survival. SVclone enables improved characterisation of SV intra-tumour heterogeneity

    Inferring structural variant cancer cell fraction

    Get PDF
    We present SVclone, a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole-genome sequencing data. SVclone accurately determines the variant allele frequencies of both SV breakends, then simultaneously estimates the cancer cell fraction and SV copy number. We assess performance using in silico mixtures of real samples, at known proportions, created from two clonal metastases from the same patient. We find that SVclone's performance is comparable to single-nucleotide variant-based methods, despite having an order of magnitude fewer data points. As part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we use SVclone to reveal a subset of liver, ovarian and pancreatic cancers with subclonally enriched copy-number neutral rearrangements that show decreased overall survival. SVclone enables improved characterisation of SV intra-tumour heterogeneity.Peer reviewe

    CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data

    Get PDF
    We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available a

    CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data

    Get PDF
    We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed.European Research Council | Ref. ERC-617457- PHYLOCANCERAgencia Estatal de Investigación | Ref. PID2019-106247GB-I00Fundação para a Ciência e a Tecnologia | Ref. PTDC/BIA-EVL/32030/2017Xunta de Galici

    Deterministic Evolutionary Trajectories Influence Primary Tumor Growth: TRACERx Renal.

    Get PDF
    The evolutionary features of clear-cell renal cell carcinoma (ccRCC) have not been systematically studied to date. We analyzed 1,206 primary tumor regions from 101 patients recruited into the multi-center prospective study, TRACERx Renal. We observe up to 30 driver events per tumor and show that subclonal diversification is associated with known prognostic parameters. By resolving the patterns of driver event ordering, co-occurrence, and mutual exclusivity at clone level, we show the deterministic nature of clonal evolution. ccRCC can be grouped into seven evolutionary subtypes, ranging from tumors characterized by early fixation of multiple mutational and copy number drivers and rapid metastases to highly branched tumors with >10 subclonal drivers and extensive parallel evolution associated with attenuated progression. We identify genetic diversity and chromosomal complexity as determinants of patient outcome. Our insights reconcile the variable clinical behavior of ccRCC and suggest evolutionary potential as a biomarker for both intervention and surveillance

    Extracting information from high-throughput gene expression data with pathway analysis and deconvolution

    Get PDF
    Modern technologies allow for the collection of large biological datasets that can be utilised for diverse health-related applications. However, to extract useful information from such data, computational methods are needed. The field that develops and explores methods to analyse biological data is called bioinformatics. In this thesis I evaluate different bioinformatic methods and introduce novel ones related to processing gene expression data. Gene expression data reflects how active different genes are in a set of measured biological samples. These samples can be for example blood from human individuals, tissue samples from tumours and the corresponding healthy tissue, or brain samples from mice with different neural diseases. This thesis covers two topics, pathway analysis and deconvolution, related to downstream analysis of gene expression data. Notably, this summary does not repeat in detail the same points made in the original publications, but aims to provide a comprehensive overview of the current knowledge of the two wider topics. The original publications focus on comparing and evaluating the available methods as well as presenting new ones that cover some previously untouched features. While the terms ’pathway analysis’ and ’deconvolution’ have been used with alternative definitions in other fields, in the context of this thesis, pathway analysis refers to estimating the activity of pathways, i.e. interaction networks body uses to react to different signals, based on given gene expression data and structural information of the relevant pathways. I focus on different types of analysis methods and their varying goals, requirements, and underlying statistical approaches. In addition, the strengths and weaknesses of the concept of pathway analysis are briefly discussed. The first two original publications I and II empirically compare different types of pathway methods and introduce a novel one. In the paper I, the tested methods are evaluated from different perspectives, and in the paper II, a novel method is introduced and its performance demonstrated against alternative tools. Many biological samples contain a variety of cell types and here, deconvolution means computationally extracting cell type composition or cell type specific expression from bulk samples. The deconvolution sections of this thesis also focus on a general overview of the topic and the available computational methodology. As deconvolution is challenging, I discuss the factors affecting its accuracy as well as alternative wet lab approaches to obtain cell type specific information. The first original publication about deconvolution (publication III) introduces a novel method and evaluates it against the other available tools. The second (publication IV) focuses on identifying cell type specific differences between sample groups, which is a particularly difficult task
    • …
    corecore