48,797 research outputs found
Error Correction in High Speed Arithmetic
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryJoint Services Electronics Program / DAAB-07-67-C-0199National Science Foundation / GK-233
A NEW APPROACH OF AN ERROR DETECTING AND CORRECTING CIRCUIT BY ARITHMETIC LOGIC BLOCKS
This paper proposes a unique method of an error detection and correction (EDAC) circuit, carried out using arithmetic logic blocks. The modified logic blocks circuit and its auxiliary components are designed with Boolean and block reduction technique, which reduced one logic gate per block. The reduced logic circuits were simulated and designed using MATLAB Simulink, DSCH 2 CAD, and Microwind CAD tools. The modified, 2:1 multiplexer, demultiplexer, comparator, 1-bit adder, ALU, and error correction and detection circuit were simulated using MATLAB and Microwind. The EDAC circuit operates at a speed of 454.676 MHz and a slew rate of -2.00 which indicates excellence in high speed and low-area.
Initial and Boundary Conditions for the Lattice Boltzmann Method
A new approach of implementing initial and boundary conditions for the
lattice Boltzmann method is presented. The new approach is based on an extended
collision operator that uses the gradients of the fluid velocity. The numerical
performance of the lattice Boltzmann method is tested on several problems with
exact solutions and is also compared to an explicit finite difference
projection method. The discretization error of the lattice Boltzmann method
decreases quadratically with finer resolution both in space and in time. The
roundoff error of the lattice Boltzmann method creates problems unless double
precision arithmetic is used.Comment: 42 pages in Postscript, with additional 27 Postscript figures
Physical Review E, Submitted December 92, Revised June 9
Design of reverse converters for the multi-moduli residue number systems with moduli of forms 2a, 2b - 1, 2c + 1
Residue number system (RNS) is a non-weighted integer number representation system that is capable of supporting parallel, carry-free and high speed arithmetic. This system is error-resilient and facilitates error detection, error correction and fault tolerance in digital systems. It finds applications in Digital Signal Processing (DSP) intensive computations like digital filtering, convolution, correlation, Discrete Fourier Transform, Fast Fourier Transform, etc. The basis for an RNS system is a moduli set consisting of relatively prime integers. Proper selection of this moduli set plays a significant role in RNS design because the speed of internal RNS arithmetic circuits as well as the speed and complexity of the residue to binary converter (R/B or Reverse Converter) have a large dependency on the form and number of the selected moduli. Moduli of forms 2a, 2b- 1, 2c + 1 (a, b and c are natural numbers) have the most use in RNS moduli sets as these moduli can be efficiently implemented using usual binary hardware that lead to simple design. Another important consideration for the reverse converter design is the selection of an appropriate conversion algorithm from Chinese Remainder Theorem (CRT), Mixed Radix Conversion (MRC) and the new Chinese Remainder Theorems (New CRT I and New CRT II). This research is focused on designing reverse converters for the multi-moduli RNS sets especially four and five moduli sets with moduli of forms 2a, 2b- 1, 2c + 1 . The residue to binary converters are designed by applying the above conversion algorithms in different possible ways and facilitating the use of modulo (2k) and modulo (2k – 1) adders that lead to simple design of adder based architectures and VLSI efficient implementations (k is a natural number). The area and delay of the proposed converters is analyzed and an efficient reverse converter is suggested from each of the various four and five moduli set converters for a given dynamic range
On the Convergence Speed of Turbo Demodulation with Turbo Decoding
Iterative processing is widely adopted nowadays in modern wireless receivers
for advanced channel codes like turbo and LDPC codes. Extension of this
principle with an additional iterative feedback loop to the demapping function
has proven to provide substantial error performance gain. However, the adoption
of iterative demodulation with turbo decoding is constrained by the additional
implied implementation complexity, heavily impacting latency and power
consumption. In this paper, we analyze the convergence speed of these combined
two iterative processes in order to determine the exact required number of
iterations at each level. Extrinsic information transfer (EXIT) charts are used
for a thorough analysis at different modulation orders and code rates. An
original iteration scheduling is proposed reducing two demapping iterations
with reasonable performance loss of less than 0.15 dB. Analyzing and
normalizing the computational and memory access complexity, which directly
impact latency and power consumption, demonstrates the considerable gains of
the proposed scheduling and the promising contributions of the proposed
analysis.Comment: Submitted to IEEE Transactions on Signal Processing on April 27, 201
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
Hardware/Software Co-design Applied to Reed-Solomon Decoding for the DMB Standard
This paper addresses the implementation of Reed-
Solomon decoding for battery-powered wireless
devices. The scope of this paper is constrained by the
Digital Media Broadcasting (DMB). The most critical
element of the Reed-Solomon algorithm is implemented
on two different reconfigurable hardware
architectures: an FPGA and a coarse-grained
architecture: the Montium, The remaining parts are
executed on an ARM processor. The results of this
research show that a co-design of the ARM together
with an FPGA or a Montium leads to a substantial
decrease in energy consumption. The energy
consumption of syndrome calculation of the Reed-
Solomon decoding algorithm is estimated for an FPGA
and a Montium by means of simulations. The Montium
proves to be more efficient
- …