11 research outputs found

    Robust unmixing of tumor states in array comparative genomic hybridization data

    Get PDF
    Motivation: Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data

    ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles

    Get PDF
    Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model.Prostate Cancer CanadaMovember Foundation (Grant RS2014-01

    Towards Quantifying Vertex Similarity in Networks

    Full text link
    Vertex similarity is a major problem in network science with a wide range of applications. In this work we provide novel perspectives on finding (dis)similar vertices within a network and across two networks with the same number of vertices (graph matching). With respect to the former problem, we propose to optimize a geometric objective which allows us to express each vertex uniquely as a convex combination of a few extreme types of vertices. Our method has the important advantage of supporting efficiently several types of queries such as "which other vertices are most similar to this vertex?" by the use of the appropriate data structures and of mining interesting patterns in the network. With respect to the latter problem (graph matching), we propose the generalized condition number --a quantity widely used in numerical analysis-- κ(LG,LH)\kappa(L_G,L_H) of the Laplacian matrix representations of G,HG,H as a measure of graph similarity, where G,HG,H are the graphs of interest. We show that this objective has a solid theoretical basis and propose a deterministic and a randomized graph alignment algorithm. We evaluate our algorithms on both synthetic and real data. We observe that our proposed methods achieve high-quality results and provide us with significant insights into the network structure.Comment: 16 papers, 5 figures, 2 table

    Medoidshift clustering applied to genomic bulk tumor data.

    Get PDF
    Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data

    Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction.

    Get PDF
    Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer

    Inference of Tumor Phylogenies from Genomic Assays on Heterogeneous Samples

    Get PDF
    Tumorigenesis can in principle result from many combinations of mutations, but only a few roughly equivalent sequences of mutations, or “progression pathways,” seem to account for most human tumors. Phylogenetics provides a promising way to identify common progression pathways and markers of those pathways. This approach, however, can be confounded by the high heterogeneity within and between tumors, which makes it difficult to identify conserved progression stages or organize them into robust progression pathways. To tackle this problem, we previously developed methods for inferring progression stages from heterogeneous tumor profiles through computational unmixing. In this paper, we develop a novel pipeline for building trees of tumor evolution from the unmixed tumor data. The pipeline implements a statistical approach for identifying robust progression markers from unmixed tumor data and calling those markers in inferred cell states. The result is a set of phylogenetic characters and their assignments in progression states to which we apply maximum parsimony phylogenetic inference to infer tumor progression pathways. We demonstrate the full pipeline on simulated and real comparative genomic hybridization (CGH) data, validating its effectiveness and making novel predictions of major progression pathways and ancestral cell states in breast cancers

    Robustness Evaluation for Phylogenetic Reconstruction Methods and Evolutionary Models Reconstruction of Tumor Progression

    Get PDF
    During evolutionary history, genomes evolve by DNA mutation, genome rearrangement, duplication and gene loss events. There has been endless effort to the phylogenetic and ancestral genome inference study. Due to the great development of various technology, the information about genomes is exponentially increasing, which make it possible figure the problem out. The problem has been shown so interesting that a great number of algorithms have been developed rigorously over the past decades in attempts to tackle these problems following different kind of principles. However, difficulties and limits in performance and capacity, and also low consistency largely prevent us from confidently statement that the problem is solved. To know the detailed evolutionary history, we need to infer the phylogeny of the evolutionary history (Big Phylogeny Problem) and also infer the internal nodes information (Small Phylogeny Problem). The work presented in this thesis focuses on assessing methods designed for attacking Small Phylogeny Problem and algorithms and models design for genome evolution history inference from FISH data for cancer data. During the recent decades, a number of evolutionary models and related algorithms have been designed to infer ancestral genome sequences or gene orders. Due to the difficulty of knowing the true scenario of the ancestral genomes, there must be some tools used to test the robustness of the adjacencies found by various methods. When it comes to methods for Big Phylogeny Problem, to test the confidence rate of the inferred branches, previous work has tested bootstrapping, jackknifing, and isolating and found them good resampling tools to corresponding phylogenetic inference methods. However, till now there is still no system work done to try and tackle this problem for small phylogeny. We tested the earlier resampling schemes and a new method inversion on different ancestral genome reconstruction methods and showed different resampling methods are appropriate for their corresponding methods. Cancer is famous for its heterogeneity, which is developed by an evolutionary process driven by mutations in tumor cells. Rapid, simultaneous linear and branching evolution has been observed and analyzed by earlier research. Such process can be modeled by a phylogenetic tree using different methods. Previous phylogenetic research used various kinds of dataset, such as FISH data, genome sequence, and gene order. FISH data is quite clean for the reason that it comes form single cells and shown to be enough to infer evolutionary process for cancer development. RSMT was shown to be a good model for phylogenetic analysis by using FISH cell count pattern data, but it need efficient heuristics because it is a NP-hard problem. To attack this problem, we proposed an iterative approach to approximate solutions to the steiner tree in the small phylogeny tree. It is shown to give better results comparing to earlier method on both real and simulation data. In this thesis, we continued the investigation on designing new method to better approximate evolutionary process of tumor and applying our method to other kinds of data such as information using high-throughput technology. Our thesis work can be divided into two parts. First, we designed new algorithms which can give the same parsimony tree as exact method in most situation and modified it to be a general phylogeny building tool. Second, we applied our methods to different kinds data such as copy number variation information inferred form next generation sequencing technology and predict key changes during evolution
    corecore