48,797 research outputs found

    Error Correction in High Speed Arithmetic

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryJoint Services Electronics Program / DAAB-07-67-C-0199National Science Foundation / GK-233

    A NEW APPROACH OF AN ERROR DETECTING AND CORRECTING CIRCUIT BY ARITHMETIC LOGIC BLOCKS

    Get PDF
    This paper proposes a unique method of an error detection and correction (EDAC) circuit, carried out using arithmetic logic blocks. The modified logic blocks circuit and its auxiliary components are designed with Boolean and block reduction technique, which reduced one logic gate per block. The reduced logic circuits were simulated and designed using MATLAB Simulink, DSCH 2 CAD, and Microwind CAD tools. The modified, 2:1 multiplexer, demultiplexer, comparator, 1-bit adder, ALU, and error correction and detection circuit were simulated using MATLAB and Microwind. The EDAC circuit operates at a speed of 454.676 MHz and a slew rate of -2.00 which indicates excellence in high speed and low-area.

    Initial and Boundary Conditions for the Lattice Boltzmann Method

    Full text link
    A new approach of implementing initial and boundary conditions for the lattice Boltzmann method is presented. The new approach is based on an extended collision operator that uses the gradients of the fluid velocity. The numerical performance of the lattice Boltzmann method is tested on several problems with exact solutions and is also compared to an explicit finite difference projection method. The discretization error of the lattice Boltzmann method decreases quadratically with finer resolution both in space and in time. The roundoff error of the lattice Boltzmann method creates problems unless double precision arithmetic is used.Comment: 42 pages in Postscript, with additional 27 Postscript figures Physical Review E, Submitted December 92, Revised June 9

    Design of reverse converters for the multi-moduli residue number systems with moduli of forms 2a, 2b - 1, 2c + 1

    Get PDF
    Residue number system (RNS) is a non-weighted integer number representation system that is capable of supporting parallel, carry-free and high speed arithmetic. This system is error-resilient and facilitates error detection, error correction and fault tolerance in digital systems. It finds applications in Digital Signal Processing (DSP) intensive computations like digital filtering, convolution, correlation, Discrete Fourier Transform, Fast Fourier Transform, etc. The basis for an RNS system is a moduli set consisting of relatively prime integers. Proper selection of this moduli set plays a significant role in RNS design because the speed of internal RNS arithmetic circuits as well as the speed and complexity of the residue to binary converter (R/B or Reverse Converter) have a large dependency on the form and number of the selected moduli. Moduli of forms 2a, 2b- 1, 2c + 1 (a, b and c are natural numbers) have the most use in RNS moduli sets as these moduli can be efficiently implemented using usual binary hardware that lead to simple design. Another important consideration for the reverse converter design is the selection of an appropriate conversion algorithm from Chinese Remainder Theorem (CRT), Mixed Radix Conversion (MRC) and the new Chinese Remainder Theorems (New CRT I and New CRT II). This research is focused on designing reverse converters for the multi-moduli RNS sets especially four and five moduli sets with moduli of forms 2a, 2b- 1, 2c + 1 . The residue to binary converters are designed by applying the above conversion algorithms in different possible ways and facilitating the use of modulo (2k) and modulo (2k – 1) adders that lead to simple design of adder based architectures and VLSI efficient implementations (k is a natural number). The area and delay of the proposed converters is analyzed and an efficient reverse converter is suggested from each of the various four and five moduli set converters for a given dynamic range

    On the Convergence Speed of Turbo Demodulation with Turbo Decoding

    Full text link
    Iterative processing is widely adopted nowadays in modern wireless receivers for advanced channel codes like turbo and LDPC codes. Extension of this principle with an additional iterative feedback loop to the demapping function has proven to provide substantial error performance gain. However, the adoption of iterative demodulation with turbo decoding is constrained by the additional implied implementation complexity, heavily impacting latency and power consumption. In this paper, we analyze the convergence speed of these combined two iterative processes in order to determine the exact required number of iterations at each level. Extrinsic information transfer (EXIT) charts are used for a thorough analysis at different modulation orders and code rates. An original iteration scheduling is proposed reducing two demapping iterations with reasonable performance loss of less than 0.15 dB. Analyzing and normalizing the computational and memory access complexity, which directly impact latency and power consumption, demonstrates the considerable gains of the proposed scheduling and the promising contributions of the proposed analysis.Comment: Submitted to IEEE Transactions on Signal Processing on April 27, 201

    GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

    Full text link
    We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

    Hardware/Software Co-design Applied to Reed-Solomon Decoding for the DMB Standard

    Get PDF
    This paper addresses the implementation of Reed- Solomon decoding for battery-powered wireless devices. The scope of this paper is constrained by the Digital Media Broadcasting (DMB). The most critical element of the Reed-Solomon algorithm is implemented on two different reconfigurable hardware architectures: an FPGA and a coarse-grained architecture: the Montium, The remaining parts are executed on an ARM processor. The results of this research show that a co-design of the ARM together with an FPGA or a Montium leads to a substantial decrease in energy consumption. The energy consumption of syndrome calculation of the Reed- Solomon decoding algorithm is estimated for an FPGA and a Montium by means of simulations. The Montium proves to be more efficient
    • …
    corecore