5,569 research outputs found
PREMIER - PRobabilistic Error-correction using Markov Inference in Errored Reads
In this work we present a flexible, probabilistic and reference-free method
of error correction for high throughput DNA sequencing data. The key is to
exploit the high coverage of sequencing data and model short sequence outputs
as independent realizations of a Hidden Markov Model (HMM). We pose the problem
of error correction of reads as one of maximum likelihood sequence detection
over this HMM. While time and memory considerations rule out an implementation
of the optimal Baum-Welch algorithm (for parameter estimation) and the optimal
Viterbi algorithm (for error correction), we propose low-complexity approximate
versions of both. Specifically, we propose an approximate Viterbi and a
sequential decoding based algorithm for the error correction. Our results show
that when compared with Reptile, a state-of-the-art error correction method,
our methods consistently achieve superior performances on both simulated and
real data sets.Comment: Submitted to ISIT 201
Synchronization recovery and state model reduction for soft decoding of variable length codes
Variable length codes exhibit de-synchronization problems when transmitted
over noisy channels. Trellis decoding techniques based on Maximum A Posteriori
(MAP) estimators are often used to minimize the error rate on the estimated
sequence. If the number of symbols and/or bits transmitted are known by the
decoder, termination constraints can be incorporated in the decoding process.
All the paths in the trellis which do not lead to a valid sequence length are
suppressed. This paper presents an analytic method to assess the expected error
resilience of a VLC when trellis decoding with a sequence length constraint is
used. The approach is based on the computation, for a given code, of the amount
of information brought by the constraint. It is then shown that this quantity
as well as the probability that the VLC decoder does not re-synchronize in a
strict sense, are not significantly altered by appropriate trellis states
aggregation. This proves that the performance obtained by running a
length-constrained Viterbi decoder on aggregated state models approaches the
one obtained with the bit/symbol trellis, with a significantly reduced
complexity. It is then shown that the complexity can be further decreased by
projecting the state model on two state models of reduced size
A generalized risk approach to path inference based on hidden Markov models
Motivated by the unceasing interest in hidden Markov models (HMMs), this
paper re-examines hidden path inference in these models, using primarily a
risk-based framework. While the most common maximum a posteriori (MAP), or
Viterbi, path estimator and the minimum error, or Posterior Decoder (PD), have
long been around, other path estimators, or decoders, have been either only
hinted at or applied more recently and in dedicated applications generally
unfamiliar to the statistical learning community. Over a decade ago, however, a
family of algorithmically defined decoders aiming to hybridize the two standard
ones was proposed (Brushe et al., 1998). The present paper gives a careful
analysis of this hybridization approach, identifies several problems and issues
with it and other previously proposed approaches, and proposes practical
resolutions of those. Furthermore, simple modifications of the classical
criteria for hidden path recognition are shown to lead to a new class of
decoders. Dynamic programming algorithms to compute these decoders in the usual
forward-backward manner are presented. A particularly interesting subclass of
such estimators can be also viewed as hybrids of the MAP and PD estimators.
Similar to previously proposed MAP-PD hybrids, the new class is parameterized
by a small number of tunable parameters. Unlike their algorithmic predecessors,
the new risk-based decoders are more clearly interpretable, and, most
importantly, work "out of the box" in practice, which is demonstrated on some
real bioinformatics tasks and data. Some further generalizations and
applications are discussed in conclusion.Comment: Section 5: corrected denominators of the scaled beta variables (pp.
27-30), => corrections in claims 1, 3, Prop. 12, bottom of Table 1. Decoder
(49), Corol. 14 are generalized to handle 0 probabilities. Notation is more
closely aligned with (Bishop, 2006). Details are inserted in eqn-s (43); the
positivity assumption in Prop. 11 is explicit. Fixed typing errors in
equation (41), Example
Coordinated design of coding and modulation systems
The joint optimization of the coding and modulation systems employed in telemetry systems was investigated. Emphasis was placed on formulating inner and outer coding standards used by the Goddard Spaceflight Center. Convolutional codes were found that are nearly optimum for use with Viterbi decoding in the inner coding of concatenated coding systems. A convolutional code, the unit-memory code, was discovered and is ideal for inner system usage because of its byte-oriented structure. Simulations of sequential decoding on the deep-space channel were carried out to compare directly various convolutional codes that are proposed for use in deep-space systems
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Convolutional coded dual header pulse interval modulation for line of sight photonic wireless links.
The analysis and simulation for convolutional coded dual header pulse interval modulation (CC-DH-PIM) scheme using a rate ½ convolutional code with the constraint length of 3 is presented. Decoding is implemented using the Viterbi algorithm with a hard decision. Mathematical analysis for the slot error rate (SER) upper bounds is presented and results are compared with the simulated data for a number of different modulation techniques. The authors show that the coded DH-PIM outperforms the pulse position modulation (PPM) scheme and offers >4 dB code gain at the SER of 10?4 compared to the standard DH-PIM. Results presented show that the CC-DH-PIM with a higher constraint length of 7 offers a code gain of 2 dB at SER of 10?5 compared to the CC-DH-PIM with a constraint length of 3. However, in CC-DH-PIM the improvement in the error performance is achieved at the cost of reduced transmission throughput compared to the standard DH-PIM
- …