2,591 research outputs found

    A decision-theoretic approach for segmental classification

    Full text link
    This paper is concerned with statistical methods for the segmental classification of linear sequence data where the task is to segment and classify the data according to an underlying hidden discrete state sequence. Such analysis is commonplace in the empirical sciences including genomics, finance and speech processing. In particular, we are interested in answering the following question: given data yy and a statistical model π(x,y)\pi(x,y) of the hidden states xx, what should we report as the prediction x^\hat{x} under the posterior distribution π(xy)\pi (x|y)? That is, how should you make a prediction of the underlying states? We demonstrate that traditional approaches such as reporting the most probable state sequence or most probable set of marginal predictions can give undesirable classification artefacts and offer limited control over the properties of the prediction. We propose a decision theoretic approach using a novel class of Markov loss functions and report x^\hat{x} via the principle of minimum expected loss (maximum expected utility). We demonstrate that the sequence of minimum expected loss under the Markov loss function can be enumerated exactly using dynamic programming methods and that it offers flexibility and performance improvements over existing techniques. The result is generic and applicable to any probabilistic model on a sequence, such as Hidden Markov models, change point or product partition models.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS657 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model

    Get PDF
    BACKGROUND: Nucleosome free regions (NFRs) play important roles in diverse biological processes including gene regulation. A genome-wide quantitative portrait of each individual NFR, with their starting and ending positions, lengths, and degrees of nucleosome depletion is critical for revealing the heterogeneity of gene regulation and chromatin organization. By averaging nucleosome occupancy levels, previous studies have identified the presence of NFRs in the promoter regions across many genes. However, evaluation of the quantitative characteristics of individual NFRs requires an NFR calling method. METHODOLOGY: In this study, we propose a statistical method to identify the patterns of NFRs from a genome-wide measurement of nucleosome occupancy. This method is based on an appropriately designed segmental semi-Markov model, which can capture each NFR pattern and output its quantitative characterizations. Our results show that the majority of the NFRs are located in intergenic regions or promoters with a length of about 400-600bp and varying degrees of nucleosome depletion. Our quantitative NFR mapping allows for an investigation of the relative impacts of transcription machinery and DNA sequence in evicting histones from NFRs. We show that while both factors have significant overall effects, their specific contributions vary across different subtypes of NFRs. CONCLUSION: The emphasis of our approach on the variation rather than the consensus of nucleosome free regions sets the tone for enabling the exploration of many subtler dynamic aspects of chromatin biology

    Phonetic and prosodic analysis of speech

    Get PDF
    In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is represented in the prosody of an utterance. We present an approach to flexible and efficient modeling of speech by segmental units and describe extraction and use of suprasegmental information

    Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition

    Get PDF
    A theoretical framework for Bayesian adaptive training of the parameters of a discrete hidden Markov model (DHMM) and of a semi-continuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forward-backward MAP (maximum a posteriori) and the segmental MAP algorithms for estimating the above HMM parameters, a computationally efficient segmental quasi-Bayes algorithm for estimating the state-specific mixture coefficients in SCHMM is developed. For estimating the parameters of the prior densities, a new empirical Bayes method based on the moment estimates is also proposed. The MAP algorithms and the prior parameter specification are directly applicable to training speaker adaptive HMMs. Practical issues related to the use of the proposed techniques for HMM-based speaker adaptation are studied. The proposed MAP algorithms are shown to be effective especially in the cases in which the training or adaptation data are limited.published_or_final_versio
    corecore