13 research outputs found

    Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana

    Get PDF
    Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM)

    Optimal Filter Approximations in Conditionally Gaussian Pairwise Markov Switching Models

    Get PDF
    International audience—We consider a general triplet Markov Gaussian linear system (X, R, Y), where X is an hidden continuous random sequence, R is an hidden discrete Markov chain, Y is an observed continuous random sequence. When the triplet (X, R, Y) is a classical " Conditionally Gaussian Linear State-Space Model " (CGLSSM), the mean square error optimal filter is not workable with a reasonable complexity and different approximate methods, e.g. based on particle filters, are used. We propose two contributions. The first one is to extend the CGLSSM to a new, more general model, called the " Conditionally Gaussian Pairwise Markov Switching Model " (CGPMSM), in which X is not necessarily Markov given R. The second contribution is to consider a particular case of CGPMSM in which (R, Y) is Markov and in which an exact filter, optimal in the sense of mean square error, can be performed with linear-time complexity. Some experiments show that the proposed method and the suited particle filter have comparable efficiency, while the second one is much faster. Index Terms—Conditionally Gaussian linear state-space model, conditionally Gaussian pairwise markov switching model, exact optimal filtering, Gaussian switching system, hidden Markov models

    Optimal Filter Approximations in Conditionally Gaussian Pairwise Markov Switching Models

    No full text
    corecore