Search CORE

24 research outputs found

Measuring the Influence of Observations in HMMs through the Kullback-Leibler Distance

Author: Nuel Gregory
Perduca Vittorio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

We measure the influence of individual observations on the sequence of the hidden states of the Hidden Markov Model (HMM) by means of the Kullback-Leibler distance (KLD). Namely, we consider the KLD between the conditional distribution of the hidden states' chain given the complete sequence of observations and the conditional distribution of the hidden chain given all the observations but the one under consideration. We introduce a linear complexity algorithm for computing the influence of all the observations. As an illustration, we investigate the application of our algorithm to the problem of detecting outliers in HMM data series

arXiv.org e-Print Archive

HAL Descartes

S-estimation of hidden Markov models

Author: Farcomeni Alessio
L. Greco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/08/2014
Field of study

A method for robust estimation of dynamic mixtures of multivariate distributions is proposed. The EM algorithm is modified by replacing the classical M-step with high breakdown S-estimation of location and scatter, performed by using the bisquare multivariate S-estimator. Estimates are obtained by solving a system of estimating equations that are characterized by component specific sets of weights, based on robust Mahalanobis-type distances. Convergence of the resulting algorithm is proved and its finite sample behavior is investigated by means of a brief simulation study and n application to a multivariate time series of daily returns for seven stock markets

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array

Author: AA Margolin
AB Olshen
D Pinkel
David T Wong
Dione K Bailey
ES Lander
F Picard
H Willenbrock
Hui Ye
J Fridlyand
J Huang
J Huang
J Liu
JC Marioni
K Jong
Ker-Chau Li
M Khojasteh
M Lin
OC Lingjaerde
OM Rueda
P Broet
P Hupe
PH Eilers
RS Daruwala
Sharoni Jacobs
SP Shah
SY Kim
Tianwei Yu
TS Price
Wei Sun
WR Lai
X Zhou
X Zhou
Xiaofeng Zhou
Y Lai
Y Nannya
Zugen Chen
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required. Results We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step. Conclusion Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH

Author: Greg Tucker-Kellogg
Oscar M Rueda
Ramón Díaz-Uriarte
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH

Author: Greg Tucker-Kellogg
Oscar M Rueda
Ramón Díaz-Uriarte
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model

Author: AB Olshen
AE Urban
AJ Iafrate
C Erdman
CL Myers
E Ben-Yaacov
E Tuzun
F Forozan
F Picard
J Fridlyand
J Sebat
J Shendure
K Jong
L Hsu
LY Wu
M Bredel
M Fedurco
M Margulies
MA Newton
Mark B Gerstein
N Metropolis
OC Lingjaerde
OM Rueda
P Broet
P Cahan
P Hupe
P Wang
PH Eilers
R Development Core Team
R Pique-Regi
R Redon
S Geman
SP Shah
V Jobanputra
WK Hastings
WR Lai
Zhengdong D Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fast MCMC sampling for hidden markov models to determine copy number variations

Author: A Krogh
A Schliep
A Viterbi
AB Olshen
Alexander Schliep
AM Snijders
CM Bishop
D Pelleg
D Pinto
F Picard
H Willenbrock
J Fridlyand
J Fritsch
K Wang
L Rabiner
LE Baum
M Bredel
Md Pavel Mahmud
P Wang
PHC Eilers
Q McNemar
R Andersson
R Durbin
R Tibshirani
RJD Leeuw
S Chib
S Geman
S Guha
S Morganella
S Mozes
S Salvador
S Scott
S Srivastava
SP Shah
SP Shah
SR Eddy
T Harada
W Gilks
WR Lai
Y Nannya
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. Results We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. Conclusions We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. <it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human

Author: A Baross
A Gimelbrant
A Siepel
A Viterbi
AM Khalil
AP Dempster
B Ge
Bing Ge
C Li
C Yau
D Serre
DJ Verlaan
Dmitry Pokholok
E Birney
E Venkatraman
H Bengtsson
H Bengtsson
J Marioni
James R. Wagner
K Wang
KA Frazer
KD Pruitt
Kevin L. Gunderson
KPV Pant
KS Pollard
L Carrel
L Rabiner
L Wu
LE Baum
Mathieu Blanchette
MV Rockman
O Rueda
P Fearnhead
S Browning
S Campino
S Colella
SH Lo
SP Shah
SP Shah
SR Eddy
T Mitchell
T Pastinen
T Pastinen
T Pastinen
Tomi Pastinen
VG Cheung
W Cookson
W Kent
WJ Kent
Wyeth W. Wasserman
Y Nannya
Publication venue: Public Library of Science
Publication date: 08/07/2010
Field of study

Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central