12 research outputs found
Getting started in probabilistic graphical models
Probabilistic graphical models (PGMs) have become a popular tool for
computational analysis of biological data in a variety of domains. But, what
exactly are they and how do they work? How can we use PGMs to discover patterns
that are biologically relevant? And to what extent can PGMs help us formulate
new hypotheses that are testable at the bench? This note sketches out some
answers and illustrates the main ideas behind the statistical approach to
biological pattern discovery.Comment: 12 pages, 1 figur
Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing
We propose a flexible change-point model for inhomogeneous Poisson Processes,
which arise naturally from next-generation DNA sequencing, and derive score and
generalized likelihood statistics for shifts in intensity functions. We
construct a modified Bayesian information criterion (mBIC) to guide model
selection, and point-wise approximate Bayesian confidence intervals for
assessing the confidence in the segmentation. The model is applied to DNA Copy
Number profiling with sequencing data and evaluated on simulated spike-in and
real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Long signal change-point detection
The detection of change-points in a spatially or time ordered data sequence
is an important problem in many fields such as genetics and finance. We derive
the asymptotic distribution of a statistic recently suggested for detecting
change-points. Simulation of its estimated limit distribution leads to a new
and computationally efficient change-point detection algorithm, which can be
used on very long signals. We assess the algorithm via simulations and on
previously benchmarked real-world data sets
Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes
Many types of tumors exhibit chromosomal losses or gains, as well as local
amplifications and deletions. Within any given tumor type, sample specific
amplifications and deletionsare also observed. Typically, a region that is
aberrant in more tumors,or whose copy number change is stronger, would be
considered as a more promising candidate to be biologically relevant to cancer.
We sought for an intuitive method to define such aberrations and prioritize
them. We define V, the volume associated with an aberration, as the product of
three factors: a. fraction of patients with the aberration, b. the aberrations
length and c. its amplitude. Our algorithm compares the values of V derived
from real data to a null distribution obtained by permutations, and yields the
statistical significance, p value, of the measured value of V. We detected
genetic locations that were significantly aberrant and combined them with
chromosomal arm status to create a succint fingerprint of the tumor genome.
This genomic fingerprint is used to visualize the tumors, highlighting events
that are co ocurring or mutually exclusive. We allpy the method on three
different public array CGH datasets of Medulloblastoma and Neuroblastoma, and
demonstrate its ability to detect chromosomal regions that were known to be
altered in the tested cancer types, as well as to suggest new genomic locations
to be tested. We identified a potential new subtype of Medulloblastoma, which
is analogous to Neuroblastoma type 1.Comment: 34 pages, 3 figures; to appear in Cancer Informatic
The group fused Lasso for multiple change-point detection
We present the group fused Lasso for detection of multiple change-points
shared by a set of co-occurring one-dimensional signals. Change-points are
detected by approximating the original signals with a constraint on the
multidimensional total variation, leading to piecewise-constant approximations.
Fast algorithms are proposed to solve the resulting optimization problems,
either exactly or approximately. Conditions are given for consistency of both
algorithms as the number of signals increases, and empirical evidence is
provided to support the results on simulated and array comparative genomic
hybridization data
Fast MCMC sampling for hidden markov models to determine copy number variations
<p>Abstract</p> <p>Background</p> <p>Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.</p> <p>Results</p> <p>We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.</p> <p>Conclusions</p> <p>We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.</p> <p><it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p
Functional Copy-Number Alterations in Cancer
Understanding the molecular basis of cancer requires characterization of its genetic defects. DNA microarray technologies can provide detailed raw data about chromosomal aberrations in tumor samples. Computational analysis is needed (1) to deduce from raw array data actual amplification or deletion events for chromosomal fragments and (2) to distinguish causal chromosomal alterations from functionally neutral ones. We present a comprehensive computational approach, RAE, designed to robustly map chromosomal alterations in tumor samples and assess their functional importance in cancer. To demonstrate the methodology, we experimentally profile copy number changes in a clinically aggressive subtype of soft-tissue sarcoma, pleomorphic liposarcoma, and computationally derive a portrait of candidate oncogenic alterations and their target genes. Many affected genes are known to be involved in sarcomagenesis; others are novel, including mediators of adipocyte differentiation, and may include valuable therapeutic targets. Taken together, we present a statistically robust methodology applicable to high-resolution genomic data to assess the extent and function of copy-number alterations in cancer