5,569 research outputs found

    PREMIER - PRobabilistic Error-correction using Markov Inference in Errored Reads

    Get PDF
    In this work we present a flexible, probabilistic and reference-free method of error correction for high throughput DNA sequencing data. The key is to exploit the high coverage of sequencing data and model short sequence outputs as independent realizations of a Hidden Markov Model (HMM). We pose the problem of error correction of reads as one of maximum likelihood sequence detection over this HMM. While time and memory considerations rule out an implementation of the optimal Baum-Welch algorithm (for parameter estimation) and the optimal Viterbi algorithm (for error correction), we propose low-complexity approximate versions of both. Specifically, we propose an approximate Viterbi and a sequential decoding based algorithm for the error correction. Our results show that when compared with Reptile, a state-of-the-art error correction method, our methods consistently achieve superior performances on both simulated and real data sets.Comment: Submitted to ISIT 201

    Synchronization recovery and state model reduction for soft decoding of variable length codes

    Get PDF
    Variable length codes exhibit de-synchronization problems when transmitted over noisy channels. Trellis decoding techniques based on Maximum A Posteriori (MAP) estimators are often used to minimize the error rate on the estimated sequence. If the number of symbols and/or bits transmitted are known by the decoder, termination constraints can be incorporated in the decoding process. All the paths in the trellis which do not lead to a valid sequence length are suppressed. This paper presents an analytic method to assess the expected error resilience of a VLC when trellis decoding with a sequence length constraint is used. The approach is based on the computation, for a given code, of the amount of information brought by the constraint. It is then shown that this quantity as well as the probability that the VLC decoder does not re-synchronize in a strict sense, are not significantly altered by appropriate trellis states aggregation. This proves that the performance obtained by running a length-constrained Viterbi decoder on aggregated state models approaches the one obtained with the bit/symbol trellis, with a significantly reduced complexity. It is then shown that the complexity can be further decreased by projecting the state model on two state models of reduced size

    A generalized risk approach to path inference based on hidden Markov models

    Full text link
    Motivated by the unceasing interest in hidden Markov models (HMMs), this paper re-examines hidden path inference in these models, using primarily a risk-based framework. While the most common maximum a posteriori (MAP), or Viterbi, path estimator and the minimum error, or Posterior Decoder (PD), have long been around, other path estimators, or decoders, have been either only hinted at or applied more recently and in dedicated applications generally unfamiliar to the statistical learning community. Over a decade ago, however, a family of algorithmically defined decoders aiming to hybridize the two standard ones was proposed (Brushe et al., 1998). The present paper gives a careful analysis of this hybridization approach, identifies several problems and issues with it and other previously proposed approaches, and proposes practical resolutions of those. Furthermore, simple modifications of the classical criteria for hidden path recognition are shown to lead to a new class of decoders. Dynamic programming algorithms to compute these decoders in the usual forward-backward manner are presented. A particularly interesting subclass of such estimators can be also viewed as hybrids of the MAP and PD estimators. Similar to previously proposed MAP-PD hybrids, the new class is parameterized by a small number of tunable parameters. Unlike their algorithmic predecessors, the new risk-based decoders are more clearly interpretable, and, most importantly, work "out of the box" in practice, which is demonstrated on some real bioinformatics tasks and data. Some further generalizations and applications are discussed in conclusion.Comment: Section 5: corrected denominators of the scaled beta variables (pp. 27-30), => corrections in claims 1, 3, Prop. 12, bottom of Table 1. Decoder (49), Corol. 14 are generalized to handle 0 probabilities. Notation is more closely aligned with (Bishop, 2006). Details are inserted in eqn-s (43); the positivity assumption in Prop. 11 is explicit. Fixed typing errors in equation (41), Example

    Coordinated design of coding and modulation systems

    Get PDF
    The joint optimization of the coding and modulation systems employed in telemetry systems was investigated. Emphasis was placed on formulating inner and outer coding standards used by the Goddard Spaceflight Center. Convolutional codes were found that are nearly optimum for use with Viterbi decoding in the inner coding of concatenated coding systems. A convolutional code, the unit-memory code, was discovered and is ideal for inner system usage because of its byte-oriented structure. Simulations of sequential decoding on the deep-space channel were carried out to compare directly various convolutional codes that are proposed for use in deep-space systems

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

    Convolutional coded dual header pulse interval modulation for line of sight photonic wireless links.

    Get PDF
    The analysis and simulation for convolutional coded dual header pulse interval modulation (CC-DH-PIM) scheme using a rate ½ convolutional code with the constraint length of 3 is presented. Decoding is implemented using the Viterbi algorithm with a hard decision. Mathematical analysis for the slot error rate (SER) upper bounds is presented and results are compared with the simulated data for a number of different modulation techniques. The authors show that the coded DH-PIM outperforms the pulse position modulation (PPM) scheme and offers >4 dB code gain at the SER of 10?4 compared to the standard DH-PIM. Results presented show that the CC-DH-PIM with a higher constraint length of 7 offers a code gain of 2 dB at SER of 10?5 compared to the CC-DH-PIM with a constraint length of 3. However, in CC-DH-PIM the improvement in the error performance is achieved at the cost of reduced transmission throughput compared to the standard DH-PIM
    corecore