Search CORE

12 research outputs found

Getting started in probabilistic graphical models

Author: Airoldi Edoardo M
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2007
Field of study

Probabilistic graphical models (PGMs) have become a popular tool for computational analysis of biological data in a variety of domains. But, what exactly are they and how do they work? How can we use PGMs to discover patterns that are biologically relevant? And to what extent can PGMs help us formulate new hypotheses that are testable at the bench? This note sketches out some answers and illustrates the main ideas behind the statistical approach to biological pattern discovery.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing

Author: Shen Jeremy J.
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We propose a flexible change-point model for inhomogeneous Poisson Processes, which arise naturally from next-generation DNA sequencing, and derive score and generalized likelihood statistics for shifts in intensity functions. We construct a modified Bayesian information criterion (mBIC) to guide model selection, and point-wise approximate Bayesian confidence intervals for assessing the confidence in the segmentation. The model is applied to DNA Copy Number profiling with sequencing data and evaluated on simulated spike-in and real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Long signal change-point detection

Author: Biau Gérard
Bleakley Kevin
Mason David
Publication venue
Publication date: 07/04/2015
Field of study

The detection of change-points in a spatially or time ordered data sequence is an important problem in many fields such as genetics and finance. We derive the asymptotic distribution of a statistic recently suggested for detecting change-points. Simulation of its estimated limit distribution leads to a new and computationally efficient change-point detection algorithm, which can be used on very long signals. We assess the algorithm via simulations and on previously benchmarked real-world data sets

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes

Author: Domany Eytan
Hegi Monika E.
Lambiv Wanyu L.
Reiner Anat
Shay Tal
Publication venue
Publication date: 09/12/2008
Field of study

Many types of tumors exhibit chromosomal losses or gains, as well as local amplifications and deletions. Within any given tumor type, sample specific amplifications and deletionsare also observed. Typically, a region that is aberrant in more tumors,or whose copy number change is stronger, would be considered as a more promising candidate to be biologically relevant to cancer. We sought for an intuitive method to define such aberrations and prioritize them. We define V, the volume associated with an aberration, as the product of three factors: a. fraction of patients with the aberration, b. the aberrations length and c. its amplitude. Our algorithm compares the values of V derived from real data to a null distribution obtained by permutations, and yields the statistical significance, p value, of the measured value of V. We detected genetic locations that were significantly aberrant and combined them with chromosomal arm status to create a succint fingerprint of the tumor genome. This genomic fingerprint is used to visualize the tumors, highlighting events that are co ocurring or mutually exclusive. We allpy the method on three different public array CGH datasets of Medulloblastoma and Neuroblastoma, and demonstrate its ability to detect chromosomal regions that were known to be altered in the tested cancer types, as well as to suggest new genomic locations to be tested. We identified a potential new subtype of Medulloblastoma, which is analogous to Neuroblastoma type 1.Comment: 34 pages, 3 figures; to appear in Cancer Informatic

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Serveur académique lausannois

PubMed Central

The group fused Lasso for multiple change-point detection

Author: Bleakley Kevin
Vert Jean-Philippe
Publication venue
Publication date: 01/01/2011
Field of study

We present the group fused Lasso for detection of multiple change-points shared by a set of co-occurring one-dimensional signals. Change-points are detected by approximating the original signals with a constraint on the multidimensional total variation, leading to piecewise-constant approximations. Fast algorithms are proposed to solve the resulting optimization problems, either exactly or approximately. Conditions are given for consistency of both algorithms as the number of signals increases, and empirical evidence is provided to support the results on simulated and array comparative genomic hybridization data

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

Fast MCMC sampling for hidden markov models to determine copy number variations

Author: A Krogh
A Schliep
A Viterbi
AB Olshen
Alexander Schliep
AM Snijders
CM Bishop
D Pelleg
D Pinto
F Picard
H Willenbrock
J Fridlyand
J Fritsch
K Wang
L Rabiner
LE Baum
M Bredel
Md Pavel Mahmud
P Wang
PHC Eilers
Q McNemar
R Andersson
R Durbin
R Tibshirani
RJD Leeuw
S Chib
S Geman
S Guha
S Morganella
S Mozes
S Salvador
S Scott
S Srivastava
SP Shah
SP Shah
SR Eddy
T Harada
W Gilks
WR Lai
Y Nannya
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. Results We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. Conclusions We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. <it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Functional Copy-Number Alterations in Cancer

Understanding the molecular basis of cancer requires characterization of its genetic defects. DNA microarray technologies can provide detailed raw data about chromosomal aberrations in tumor samples. Computational analysis is needed (1) to deduce from raw array data actual amplification or deletion events for chromosomal fragments and (2) to distinguish causal chromosomal alterations from functionally neutral ones. We present a comprehensive computational approach, RAE, designed to robustly map chromosomal alterations in tumor samples and assess their functional importance in cancer. To demonstrate the methodology, we experimentally profile copy number changes in a clinically aggressive subtype of soft-tissue sarcoma, pleomorphic liposarcoma, and computationally derive a portrait of candidate oncogenic alterations and their target genes. Many affected genes are known to be involved in sarcomagenesis; others are novel, including mediators of adipocyte differentiation, and may include valuable therapeutic targets. Taken together, we present a statistically robust methodology applicable to high-resolution genomic data to assess the extent and function of copy-number alterations in cancer

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Modeling recurrent DNA copy number alterations in array CGH data

Author: Alarcon-Vargas
Baldwin
Bishop
Coe
de Leeuw
Diskin
Durbin
Fridlyand
Garnis
Gelman
Ghahramani
Ishkanian
Jong
Kawamata
Kevin P. Murphy
Kim
Lipson
Liu
Pinkel
Pollack
Raymond T. Ng
Redon
Rouveirol
Scott
Shah
Sohrab P. Shah
Swinson
Veltman
Wan L. Lam
Weber
Wong
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref