4,973 research outputs found
Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes
In this work, we consider pseudocodewords of (relaxed) linear programming
(LP) decoding of 3-dimensional turbo codes (3D-TCs). We present a relaxed LP
decoder for 3D-TCs, adapting the relaxed LP decoder for conventional turbo
codes proposed by Feldman in his thesis. We show that the 3D-TC polytope is
proper and -symmetric, and make a connection to finite graph covers of the
3D-TC factor graph. This connection is used to show that the support set of any
pseudocodeword is a stopping set of iterative decoding of 3D-TCs using maximum
a posteriori constituent decoders on the binary erasure channel. Furthermore,
we compute ensemble-average pseudoweight enumerators of 3D-TCs and perform a
finite-length minimum pseudoweight analysis for small cover degrees. Also, an
explicit description of the fundamental cone of the 3D-TC polytope is given.
Finally, we present an extensive numerical study of small-to-medium block
length 3D-TCs, which shows that 1) typically (i.e., in most cases) when the
minimum distance and/or the stopping distance is
high, the minimum pseudoweight (on the additive white Gaussian noise channel)
is strictly smaller than both the and the , and 2)
the minimum pseudoweight grows with the block length, at least for
small-to-medium block lengths.Comment: To appear in IEEE Transactions on Communication
Optimized Cartesian -Means
Product quantization-based approaches are effective to encode
high-dimensional data points for approximate nearest neighbor search. The space
is decomposed into a Cartesian product of low-dimensional subspaces, each of
which generates a sub codebook. Data points are encoded as compact binary codes
using these sub codebooks, and the distance between two data points can be
approximated efficiently from their codes by the precomputed lookup tables.
Traditionally, to encode a subvector of a data point in a subspace, only one
sub codeword in the corresponding sub codebook is selected, which may impose
strict restrictions on the search accuracy. In this paper, we propose a novel
approach, named Optimized Cartesian -Means (OCKM), to better encode the data
points for more accurate approximate nearest neighbor search. In OCKM, multiple
sub codewords are used to encode the subvector of a data point in a subspace.
Each sub codeword stems from different sub codebooks in each subspace, which
are optimally generated with regards to the minimization of the distortion
errors. The high-dimensional data point is then encoded as the concatenation of
the indices of multiple sub codewords from all the subspaces. This can provide
more flexibility and lower distortion errors than traditional methods.
Experimental results on the standard real-life datasets demonstrate the
superiority over state-of-the-art approaches for approximate nearest neighbor
search.Comment: to appear in IEEE TKDE, accepted in Apr. 201
Health history pattern extraction from textual medical records
Extracting patterns from medical records using temporal data mining techniques
Optimal sampling strategies for multiscale stochastic processes
In this paper, we determine which non-random sampling of fixed size gives the
best linear predictor of the sum of a finite spatial population. We employ
different multiscale superpopulation models and use the minimum mean-squared
error as our optimality criterion. In multiscale superpopulation tree models,
the leaves represent the units of the population, interior nodes represent
partial sums of the population, and the root node represents the total sum of
the population. We prove that the optimal sampling pattern varies dramatically
with the correlation structure of the tree nodes. While uniform sampling is
optimal for trees with ``positive correlation progression'', it provides the
worst possible sampling with ``negative correlation progression.'' As an
analysis tool, we introduce and study a class of independent innovations trees
that are of interest in their own right. We derive a fast water-filling
algorithm to determine the optimal sampling of the leaves to estimate the root
of an independent innovations tree.Comment: Published at http://dx.doi.org/10.1214/074921706000000509 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …