81 research outputs found
Robust unmixing of tumor states in array comparative genomic hybridization data
Motivation: Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data
Inference of Tumor Phylogenies from Genomic Assays on Heterogeneous Samples
Tumorigenesis can in principle result from many combinations of mutations, but only a few roughly
equivalent sequences of mutations, or “progression pathways,” seem to account for most human tumors.
Phylogenetics provides a promising way to identify common progression pathways and markers of those
pathways. This approach, however, can be confounded by the high heterogeneity within and between
tumors, which makes it difficult to identify conserved progression stages or organize them into robust
progression pathways. To tackle this problem, we previously developed methods for inferring progression
stages from heterogeneous tumor profiles through computational unmixing. In this paper, we develop
a novel pipeline for building trees of tumor evolution from the unmixed tumor data. The pipeline
implements a statistical approach for identifying robust progression markers from unmixed tumor data
and calling those markers in inferred cell states. The result is a set of phylogenetic characters and their
assignments in progression states to which we apply maximum parsimony phylogenetic inference to infer
tumor progression pathways. We demonstrate the full pipeline on simulated and real comparative genomic
hybridization (CGH) data, validating its effectiveness and making novel predictions of major progression
pathways and ancestral cell states in breast cancers
Medoidshift clustering applied to genomic bulk tumor data.
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data
Recommended from our members
Spectral imaging in preclinical research and clinical pathology.
Spectral imaging methods are attracting increased interest from researchers and practitioners in basic science, pre-clinical and clinical arenas. A combination of better labeling reagents and better optics creates opportunities to detect and measure multiple parameters at the molecular and cellular level. These tools can provide valuable insights into the basic mechanisms of life, and yield diagnostic and prognostic information for clinical applications. There are many multispectral technologies available, each with its own advantages and limitations. This chapter will present an overview of the rationale for spectral imaging, and discuss the hardware, software and sample labeling strategies that can optimize its usefulness in clinical settings
ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles
Background
Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer.
Results
To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets.
Conclusions
The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model.Prostate Cancer CanadaMovember Foundation (Grant RS2014-01
Towards Quantifying Vertex Similarity in Networks
Vertex similarity is a major problem in network science with a wide range of
applications. In this work we provide novel perspectives on finding
(dis)similar vertices within a network and across two networks with the same
number of vertices (graph matching). With respect to the former problem, we
propose to optimize a geometric objective which allows us to express each
vertex uniquely as a convex combination of a few extreme types of vertices. Our
method has the important advantage of supporting efficiently several types of
queries such as "which other vertices are most similar to this vertex?" by the
use of the appropriate data structures and of mining interesting patterns in
the network. With respect to the latter problem (graph matching), we propose
the generalized condition number --a quantity widely used in numerical
analysis-- of the Laplacian matrix representations of
as a measure of graph similarity, where are the graphs of interest. We
show that this objective has a solid theoretical basis and propose a
deterministic and a randomized graph alignment algorithm. We evaluate our
algorithms on both synthetic and real data. We observe that our proposed
methods achieve high-quality results and provide us with significant insights
into the network structure.Comment: 16 papers, 5 figures, 2 table
- …