27,676 research outputs found

    Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems

    Get PDF
    This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an affine transformation over GF(2), where the source and target indices are treated as bit vectors. The class of BMMC permutations includes many common permutations, such as matrix transposition (when dimensions are powers of 2), bit-reversal permutations, vector-reversal permutations, hypercube permutations, matrix reblocking, Gray-code permutations, and inverse Gray-code permutations. The upper bound improves upon the asymptotic bound in the previous best known BMMC algorithm and upon the constant factor in the previous best known bit-permute/complement (BPC) permutation algorithm. The algorithm achieving the upper bound uses basic linear-algebra techniques to factor the characteristic matrix for the BMMC permutation into a product of factors, each of which characterizes a permutation that can be performed in one pass over the data. The factoring uses new subclasses of BMMC permutations: memoryload-dispersal (MLD) permutations and their inverses. These subclasses extend the catalog of one-pass permutations. Although many BMMC permutations of practical interest fall into subclasses that might be explicitly invoked within the source code, this paper shows how to quickly detect whether a given vector of target addresses specifies a BMMC permutation. Thus, one can determine efficiently at run time whether a permutation to be performed is BMMC and then avoid the general-permutation algorithm and save parallel I/Os by using the BMMC permutation algorithm herein

    Pruned Bit-Reversal Permutations: Mathematical Characterization, Fast Algorithms and Architectures

    Full text link
    A mathematical characterization of serially-pruned permutations (SPPs) employed in variable-length permuters and their associated fast pruning algorithms and architectures are proposed. Permuters are used in many signal processing systems for shuffling data and in communication systems as an adjunct to coding for error correction. Typically only a small set of discrete permuter lengths are supported. Serial pruning is a simple technique to alter the length of a permutation to support a wider range of lengths, but results in a serial processing bottleneck. In this paper, parallelizing SPPs is formulated in terms of recursively computing sums involving integer floor and related functions using integer operations, in a fashion analogous to evaluating Dedekind sums. A mathematical treatment for bit-reversal permutations (BRPs) is presented, and closed-form expressions for BRP statistics are derived. It is shown that BRP sequences have weak correlation properties. A new statistic called permutation inliers that characterizes the pruning gap of pruned interleavers is proposed. Using this statistic, a recursive algorithm that computes the minimum inliers count of a pruned BR interleaver (PBRI) in logarithmic time complexity is presented. This algorithm enables parallelizing a serial PBRI algorithm by any desired parallelism factor by computing the pruning gap in lookahead rather than a serial fashion, resulting in significant reduction in interleaving latency and memory overhead. Extensions to 2-D block and stream interleavers, as well as applications to pruned fast Fourier transforms and LTE turbo interleavers, are also presented. Moreover, hardware-efficient architectures for the proposed algorithms are developed. Simulation results demonstrate 3 to 4 orders of magnitude improvement in interleaving time compared to existing approaches.Comment: 31 page

    Cache-Oblivious Selection in Sorted X+Y Matrices

    Full text link
    Let X[0..n-1] and Y[0..m-1] be two sorted arrays, and define the mxn matrix A by A[j][i]=X[i]+Y[j]. Frederickson and Johnson gave an efficient algorithm for selecting the k-th smallest element from A. We show how to make this algorithm IO-efficient. Our cache-oblivious algorithm performs O((m+n)/B) IOs, where B is the block size of memory transfers

    Low-Complexity Puncturing and Shortening of Polar Codes

    Full text link
    In this work, we address the low-complexity construction of shortened and punctured polar codes from a unified view. While several independent puncturing and shortening designs were attempted in the literature, our goal is a unique, low-complexity construction encompassing both techniques in order to achieve any code length and rate. We observe that our solution significantly reduces the construction complexity as compared to state-of-the-art solutions while providing a block error rate performance comparable to constructions that are highly optimized for specific lengths and rates. This makes the constructed polar codes highly suitable for practical application in future communication systems requiring a large set of polar codes with different lengths and rates.Comment: to appear in WCNC 2017 - "Polar Coding in Wireless Communications: Theory and Implementation" Worksho

    Toward an Energy Efficient Language and Compiler for (Partially) Reversible Algorithms

    Full text link
    We introduce a new programming language for expressing reversibility, Energy-Efficient Language (Eel), geared toward algorithm design and implementation. Eel is the first language to take advantage of a partially reversible computation model, where programs can be composed of both reversible and irreversible operations. In this model, irreversible operations cost energy for every bit of information created or destroyed. To handle programs of varying degrees of reversibility, Eel supports a log stack to automatically trade energy costs for space costs, and introduces many powerful control logic operators including protected conditional, general conditional, protected loops, and general loops. In this paper, we present the design and compiler for the three language levels of Eel along with an interpreter to simulate and annotate incurred energy costs of a program.Comment: 17 pages, 0 additional figures, pre-print to be published in The 8th Conference on Reversible Computing (RC2016
    • …
    corecore