1,101 research outputs found

    Parameter estimation for robust HMM analysis of ChIP-chip data

    Get PDF
    Tiling arrays are an important tool for the study of transcriptional activity, protein-DNA interactions and chromatin structure on a genome-wide scale at high resolution. Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc. Especially in the context of ChIP-chip experiments, no standard procedures exist to obtain parameter estimates from the data. Common methods for the calculation of maximum likelihood estimates such as the Baum-Welch algorithm or Viterbi training are rarely applied in the context of tiling array analysis. Results: Here we develop a hidden Markov model for the analysis of chromatin structure ChIP-chip tiling array data, using t emission distributions to increase robustness towards outliers. Maximum likelihood estimates are used for all model parameters. Two different approaches to parameter estimation are investigated and combined into an efficient procedure. Conclusion: We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates. The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.13 page(s

    S-estimation of hidden Markov models

    Get PDF
    A method for robust estimation of dynamic mixtures of multivariate distributions is proposed. The EM algorithm is modified by replacing the classical M-step with high breakdown S-estimation of location and scatter, performed by using the bisquare multivariate S-estimator. Estimates are obtained by solving a system of estimating equations that are characterized by component specific sets of weights, based on robust Mahalanobis-type distances. Convergence of the resulting algorithm is proved and its finite sample behavior is investigated by means of a brief simulation study and n application to a multivariate time series of daily returns for seven stock markets

    Bayesian modeling of ChIP-chip data using latent variables

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations.</p> <p>Results</p> <p>In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length.</p> <p>Conclusion</p> <p>The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the Bayesian latent method can outperform other methods, especially when the data contain outliers.</p

    MODELING DNA METHYLATION TILING ARRAY DATA

    Get PDF
    Epigenetics is the study of heritable changes in gene function that occur without a change in DNA sequence. It has quickly emerged as an essential area for understanding inheritance and variation that cannot be explained by the DNA sequence alone. Epigenetic modifications have the potential to regulate gene expression and may play a role in diseases such as cancer. DNA methylation is a type of epigenetic modification that occurs when a methyl chemical group attaches to a cytosine base on the DNA molecule. To better understand this epigenetic mechanism, DNA methylation profiles can be constructed by identifying all locations of DNA methylation in a genomic region (e.g. chromosome or whole-genome). Large-scale studies of DNA methylation are supported by microarray technology known as tiling arrays. These arrays provide high-density coverage of genomic regions through the unbiased, systematic selection of probes that are tiled across the regions. Statistical methods are employed to estimate each probe’s DNA methylation status. Previous studies indicate that DNA methylation patterns of some organisms differ by genomic element (e.g., gene, transposon), suggesting that genomic annotation information may be useful in statistical analysis. In this work, a novel statistical model is proposed, which takes advantage of genomic annotation information that to date has not been effectively utilized in statistical analysis. Specifically, a hidden Markov model, which incorporates genomic annotation, is introduced and investigated through a simulation study and analysis of an Arabidopsis thaliana DNA methylation tiling array experiment

    Robust and Sparse Regression via γ\gamma-divergence

    Full text link
    In high-dimensional data, many sparse regression methods have been proposed. However, they may not be robust against outliers. Recently, the use of density power weight has been studied for robust parameter estimation and the corresponding divergences have been discussed. One of such divergences is the γ\gamma-divergence and the robust estimator using the γ\gamma-divergence is known for having a strong robustness. In this paper, we consider the robust and sparse regression based on γ\gamma-divergence. We extend the γ\gamma-divergence to the regression problem and show that it has a strong robustness under heavy contamination even when outliers are heterogeneous. The loss function is constructed by an empirical estimate of the γ\gamma-divergence with sparse regularization and the parameter estimate is defined as the minimizer of the loss function. To obtain the robust and sparse estimate, we propose an efficient update algorithm which has a monotone decreasing property of the loss function. Particularly, we discuss a linear regression problem with L1L_1 regularization in detail. In numerical experiments and real data analyses, we see that the proposed method outperforms past robust and sparse methods.Comment: 25 page

    Hidden Markov models for radio localization in mixed LOS/NLOS conditions

    Get PDF
    Abstract—This paper deals with the problem of radio localization of moving terminals (MTs) for indoor applications with mixed line-of-sight/non-line-of-sight (LOS/NLOS) conditions. To reduce false localizations, a grid-based Bayesian approach is proposed to jointly track the sequence of the positions and the sight conditions of the MT. This method is based on the assumption that both the MT position and the sight condition are Markov chains whose state is hidden in the received signals [hidden Markov model (HMM)]. The observations used for the HMM localization are obtained from the power-delay profile of the received signals. In ultrawideband (UWB) systems, the use of the whole power-delay profile, rather than the total power only, allows to reach higher localization accuracy, as the power-profile is a joint measurement of time of arrival and power. Numerical results show that the proposed HMM method improves the accuracy of localization with respect to conventional ranging methods, especially in mixed LOS/NLOS indoor environments. Index Terms—Bayesian estimation, hidden Markov models (HMM), mobile positioning, source localization, tracking algorithms
    • …
    corecore