13 research outputs found
Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM)
Optimal Filter Approximations in Conditionally Gaussian Pairwise Markov Switching Models
International audience—We consider a general triplet Markov Gaussian linear system (X, R, Y), where X is an hidden continuous random sequence, R is an hidden discrete Markov chain, Y is an observed continuous random sequence. When the triplet (X, R, Y) is a classical " Conditionally Gaussian Linear State-Space Model " (CGLSSM), the mean square error optimal filter is not workable with a reasonable complexity and different approximate methods, e.g. based on particle filters, are used. We propose two contributions. The first one is to extend the CGLSSM to a new, more general model, called the " Conditionally Gaussian Pairwise Markov Switching Model " (CGPMSM), in which X is not necessarily Markov given R. The second contribution is to consider a particular case of CGPMSM in which (R, Y) is Markov and in which an exact filter, optimal in the sense of mean square error, can be performed with linear-time complexity. Some experiments show that the proposed method and the suited particle filter have comparable efficiency, while the second one is much faster. Index Terms—Conditionally Gaussian linear state-space model, conditionally Gaussian pairwise markov switching model, exact optimal filtering, Gaussian switching system, hidden Markov models