4,973 research outputs found

    Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

    Full text link
    In this work, we consider pseudocodewords of (relaxed) linear programming (LP) decoding of 3-dimensional turbo codes (3D-TCs). We present a relaxed LP decoder for 3D-TCs, adapting the relaxed LP decoder for conventional turbo codes proposed by Feldman in his thesis. We show that the 3D-TC polytope is proper and CC-symmetric, and make a connection to finite graph covers of the 3D-TC factor graph. This connection is used to show that the support set of any pseudocodeword is a stopping set of iterative decoding of 3D-TCs using maximum a posteriori constituent decoders on the binary erasure channel. Furthermore, we compute ensemble-average pseudoweight enumerators of 3D-TCs and perform a finite-length minimum pseudoweight analysis for small cover degrees. Also, an explicit description of the fundamental cone of the 3D-TC polytope is given. Finally, we present an extensive numerical study of small-to-medium block length 3D-TCs, which shows that 1) typically (i.e., in most cases) when the minimum distance dmind_{\rm min} and/or the stopping distance hminh_{\rm min} is high, the minimum pseudoweight (on the additive white Gaussian noise channel) is strictly smaller than both the dmind_{\rm min} and the hminh_{\rm min}, and 2) the minimum pseudoweight grows with the block length, at least for small-to-medium block lengths.Comment: To appear in IEEE Transactions on Communication

    Optimized Cartesian KK-Means

    Full text link
    Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named Optimized Cartesian KK-Means (OCKM), to better encode the data points for more accurate approximate nearest neighbor search. In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The high-dimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life datasets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.Comment: to appear in IEEE TKDE, accepted in Apr. 201

    Health history pattern extraction from textual medical records

    Get PDF
    Extracting patterns from medical records using temporal data mining techniques

    Optimal sampling strategies for multiscale stochastic processes

    Full text link
    In this paper, we determine which non-random sampling of fixed size gives the best linear predictor of the sum of a finite spatial population. We employ different multiscale superpopulation models and use the minimum mean-squared error as our optimality criterion. In multiscale superpopulation tree models, the leaves represent the units of the population, interior nodes represent partial sums of the population, and the root node represents the total sum of the population. We prove that the optimal sampling pattern varies dramatically with the correlation structure of the tree nodes. While uniform sampling is optimal for trees with ``positive correlation progression'', it provides the worst possible sampling with ``negative correlation progression.'' As an analysis tool, we introduce and study a class of independent innovations trees that are of interest in their own right. We derive a fast water-filling algorithm to determine the optimal sampling of the leaves to estimate the root of an independent innovations tree.Comment: Published at http://dx.doi.org/10.1214/074921706000000509 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore