596 research outputs found

    Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    Get PDF
    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    Integer codes correcting sparse byte errors

    Get PDF
    In public optical networks, the data are scrambled with a xu + 1 self-synchronous scramblers (SSSs). The reason for this is to avoid long strings of ones or zeros, which might affect the receiver synchronization. Unfortunately, the use of SSSs is always related to the problem of duplication of channel errors. More precisely, each error occurring during the transmission will be duplicated u bits later. In this paper, we present a low-cost solution to this problem based on integer codes capable of correcting sparse byte errors.Radonjic, A., Vujicic, V., 2019. Integer codes correcting sparse byte errors. Cryptogr. Commun. 11, 1069–1077. [https://doi.org/10.1007/s12095-019-0350-9

    Integer Codes Correcting Single Errors and Detecting Burst Errors Within a Byte

    Get PDF
    Correcting single and detecting adjacent errors has become important in memory systems using high density DRAM chips. The reason is that, in these systems, the strike of a single energetic particle can upset one or more adjacent bits. In this article, we present a simple solution for this problem based on integer codes capable of correcting single errors and detecting l -bit burst errors confined to a b -bit byte ( 1<l<b ). Unlike the classical approach, the proposed one does not rely on the use of dedicated encoding/decoding hardware. Instead, it uses the processor as both encoder and decoder. The effectiveness of such solution is demonstrated on a theoretical model of an eight-core processor. The obtained results show that it has the potential to be used in future DDR5 systems.This is the peer-reviewed version of the paper: Radonjic, A., 2020. Integer Codes Correcting Single Errors and Detecting Burst Errors Within a Byte. IEEE Transactions on Device and Materials Reliability 20, 748–753. [https://doi.org/10.1109/TDMR.2020.3033511]© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Published version: [https://hdl.handle.net/21.15107/rcub_dais_9998

    Addressing multiple bit/symbol errors in DRAM subsystem

    Get PDF
    As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller

    VLSI Implementation of Multi-Bit Error Detection and Correction Codes for Space Communications

    Get PDF
    Data transmission in advanced space communications are suffering with the different types of noises. Further, these noises causeburst errors indata. Thus, the error correction codes (ECC) plays the major role to detect and correct the errors. However, the conventional hamming encoders, decoderswere detected and corrected only one bit error. Therefore, this work implementation the Multi-Bit Error Detection and CorrectionCodes (MBE-DCC) for multiple bits error detection and correction. Initially, MBE-DCC encoding operation is implemented by using generator matrix, which contains both identity bits and parity bits. Then, encoded code word is transmitted into the channel of space communication, where encoded data corrupted by different types of noises, errors. Therefore, the MBE-DCC decoding operation performed at receiver side of space communications, which corrected all the errors using syndrome detection, error location detection, and error correction modules.  The simulations revealed that the proposed MBE-DCC resulted in superior performance than conventional ECC method

    CEPRAM: Compression for Endurance in PCM RAM

    Get PDF
    We deal with the endurance problem of Phase Change Memories (PCM) by proposing Compression for Endurance in PCM RAM (CEPRAM), a technique to elongate the lifespan of PCM-based main memory through compression. We introduce a total of three compression schemes based on already existent schemes, but targeting compression for PCM-based systems. We do a two-level evaluation. First, we quantify the performance of the compression, in terms of compressed size, bit-flips and how they are affected by errors. Next, we simulate these parameters in a statistical simulator to study how they affect the endurance of the system. Our simulation results reveal that our technique, which is built on top of Error Correcting Pointers (ECP) but using a high-performance cache-oriented compression algorithm modified to better suit our purpose, manages to further extend the lifetime of the memory system. In particular, it guarantees that at least half of the physical pages are in usable condition for 25% longer than ECP, which is slightly more than 5% more than a scheme that can correct 16 failures per block

    Reducing soft errors through operand width aware policies

    Get PDF
    Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.Peer ReviewedPostprint (published version

    Hardware/Software Co-design Applied to Reed-Solomon Decoding for the DMB Standard

    Get PDF
    This paper addresses the implementation of Reed- Solomon decoding for battery-powered wireless devices. The scope of this paper is constrained by the Digital Media Broadcasting (DMB). The most critical element of the Reed-Solomon algorithm is implemented on two different reconfigurable hardware architectures: an FPGA and a coarse-grained architecture: the Montium, The remaining parts are executed on an ARM processor. The results of this research show that a co-design of the ARM together with an FPGA or a Montium leads to a substantial decrease in energy consumption. The energy consumption of syndrome calculation of the Reed- Solomon decoding algorithm is estimated for an FPGA and a Montium by means of simulations. The Montium proves to be more efficient
    • …
    corecore