27,676 research outputs found
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an affine transformation over GF(2), where the source and target indices are treated as bit vectors. The class of BMMC permutations includes many common permutations, such as matrix transposition (when dimensions are powers of 2), bit-reversal permutations, vector-reversal permutations, hypercube permutations, matrix reblocking, Gray-code permutations, and inverse Gray-code permutations. The upper bound improves upon the asymptotic bound in the previous best known BMMC algorithm and upon the constant factor in the previous best known bit-permute/complement (BPC) permutation algorithm. The algorithm achieving the upper bound uses basic linear-algebra techniques to factor the characteristic matrix for the BMMC permutation into a product of factors, each of which characterizes a permutation that can be performed in one pass over the data.
The factoring uses new subclasses of BMMC permutations: memoryload-dispersal (MLD) permutations and their inverses. These subclasses extend the catalog of one-pass permutations.
Although many BMMC permutations of practical interest fall into subclasses that might be explicitly invoked within the source code, this paper shows how to quickly detect whether a given vector of target addresses specifies a BMMC permutation. Thus, one can determine efficiently at run time whether a permutation to be performed is BMMC and then avoid the general-permutation algorithm and save parallel I/Os by using the BMMC permutation algorithm herein
Pruned Bit-Reversal Permutations: Mathematical Characterization, Fast Algorithms and Architectures
A mathematical characterization of serially-pruned permutations (SPPs)
employed in variable-length permuters and their associated fast pruning
algorithms and architectures are proposed. Permuters are used in many signal
processing systems for shuffling data and in communication systems as an
adjunct to coding for error correction. Typically only a small set of discrete
permuter lengths are supported. Serial pruning is a simple technique to alter
the length of a permutation to support a wider range of lengths, but results in
a serial processing bottleneck. In this paper, parallelizing SPPs is formulated
in terms of recursively computing sums involving integer floor and related
functions using integer operations, in a fashion analogous to evaluating
Dedekind sums. A mathematical treatment for bit-reversal permutations (BRPs) is
presented, and closed-form expressions for BRP statistics are derived. It is
shown that BRP sequences have weak correlation properties. A new statistic
called permutation inliers that characterizes the pruning gap of pruned
interleavers is proposed. Using this statistic, a recursive algorithm that
computes the minimum inliers count of a pruned BR interleaver (PBRI) in
logarithmic time complexity is presented. This algorithm enables parallelizing
a serial PBRI algorithm by any desired parallelism factor by computing the
pruning gap in lookahead rather than a serial fashion, resulting in significant
reduction in interleaving latency and memory overhead. Extensions to 2-D block
and stream interleavers, as well as applications to pruned fast Fourier
transforms and LTE turbo interleavers, are also presented. Moreover,
hardware-efficient architectures for the proposed algorithms are developed.
Simulation results demonstrate 3 to 4 orders of magnitude improvement in
interleaving time compared to existing approaches.Comment: 31 page
Cache-Oblivious Selection in Sorted X+Y Matrices
Let X[0..n-1] and Y[0..m-1] be two sorted arrays, and define the mxn matrix A
by A[j][i]=X[i]+Y[j]. Frederickson and Johnson gave an efficient algorithm for
selecting the k-th smallest element from A. We show how to make this algorithm
IO-efficient. Our cache-oblivious algorithm performs O((m+n)/B) IOs, where B is
the block size of memory transfers
Low-Complexity Puncturing and Shortening of Polar Codes
In this work, we address the low-complexity construction of shortened and
punctured polar codes from a unified view. While several independent puncturing
and shortening designs were attempted in the literature, our goal is a unique,
low-complexity construction encompassing both techniques in order to achieve
any code length and rate. We observe that our solution significantly reduces
the construction complexity as compared to state-of-the-art solutions while
providing a block error rate performance comparable to constructions that are
highly optimized for specific lengths and rates. This makes the constructed
polar codes highly suitable for practical application in future communication
systems requiring a large set of polar codes with different lengths and rates.Comment: to appear in WCNC 2017 - "Polar Coding in Wireless Communications:
Theory and Implementation" Worksho
Toward an Energy Efficient Language and Compiler for (Partially) Reversible Algorithms
We introduce a new programming language for expressing reversibility,
Energy-Efficient Language (Eel), geared toward algorithm design and
implementation. Eel is the first language to take advantage of a partially
reversible computation model, where programs can be composed of both reversible
and irreversible operations. In this model, irreversible operations cost energy
for every bit of information created or destroyed. To handle programs of
varying degrees of reversibility, Eel supports a log stack to automatically
trade energy costs for space costs, and introduces many powerful control logic
operators including protected conditional, general conditional, protected
loops, and general loops. In this paper, we present the design and compiler for
the three language levels of Eel along with an interpreter to simulate and
annotate incurred energy costs of a program.Comment: 17 pages, 0 additional figures, pre-print to be published in The 8th
Conference on Reversible Computing (RC2016
- …