3,951 research outputs found

    MCU Tolerance in SRAMs through Low Redundancy Triple Adjacent Error Correction

    Full text link
    (c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Static random access memories (SRAMs) are key in electronic systems. They are used not only as standalone devices, but also embedded in application specific integrated circuits. One key challenge for memories is their susceptibility to radiation-induced soft errors that change the value of memory cells. Error correction codes (ECCs) are commonly used to ensure correct data despite soft errors effects in semiconductor memories. Single error correction/double error detection (SEC-DED) codes have been traditionally the preferred choice for data protection in SRAMs. During the last decade, the percentage of errors that affect more than one memory cell has increased substantially, mainly due to multiple cell upsets (MCUs) caused by radiation. The bits affected by these errors are physically close. To mitigate their effects, ECCs that correct single errors and double adjacent errors have been proposed. These codes, known as single error correction/double adjacent error correction (SEC-DAEC), require the same number of parity bits as traditional SEC-DED codes and a moderate increase in the decoder complexity. However, MCUs are not limited to double adjacent errors, because they affect more bits as technology scales. In this brief, new codes that can correct triple adjacent errors and 3-bit burst errors are presented. They have been implemented using a 45-nm library and compared with previous proposals, showing that our codes have better error protection with a moderate overhead and low redundancy.This work was supported in part by the Universitat Politecnica de Valencia, Valencia, Spain, through the DesTT Research Project under Grant SP20120806; in part by the Spanish Ministry of Science and Education under Project AYA-2009-13300-C03; in part by the Arenes Research Project under Grant TIN2012-38308-C02-01; and in part by the Research Project entitled Manufacturable and Dependable Multicore Architectures at Nanoscale within the framework of COST ICT Action under Grant 1103.Saiz-Adalid, L.; Reviriego, P.; Gil, P.; Pontarelli, S.; Maestro, JA. (2015). MCU Tolerance in SRAMs through Low Redundancy Triple Adjacent Error Correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 23(10):2332-2336. https://doi.org/10.1109/TVLSI.2014.2357476S23322336231

    Fault Secure Encoder and Decoder for NanoMemory Applications

    Get PDF
    Memory cells have been protected from soft errors for more than a decade; due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. We introduce a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean geometry low-density parity-check (EG-LDPC) codes have the fault-secure detector capability. Using some of the smaller EG-LDPC codes, we can tolerate bit or nanowire defect rates of 10% and fault rates of 10^(-18) upsets/device/cycle, achieving a FIT rate at or below one for the entire memory system and a memory density of 10^(11) bit/cm^2 with nanowire pitch of 10 nm for memory blocks of 10 Mb or larger. Larger EG-LDPC codes can achieve even higher reliability and lower area overhead

    Novel fault tolerant Multi-Bit Upset (MBU) Error-Detection and Correction (EDAC) architecture

    Get PDF
    Desde el punto de vista de seguridad, la certificación aeronáutica de aplicaciones críticas de vuelo requiere diferentes técnicas que son usadas para prevenir fallos en los equipos electrónicos. Los fallos de tipo hardware debido a la radiación solar que existe a las alturas standard de vuelo, como SEU (Single Event Upset) y MCU (Multiple Bit Upset), provocan un cambio de estado de los bits que soportan la información almacenada en memoria. Estos fallos se producen, por ejemplo, en la memoria de configuración de una FPGA, que es donde se definen todas las funcionalidades. Las técnicas de protección requieren normalmente de redundancias que incrementan el coste, número de componentes, tamaño de la memoria y peso. En la fase de desarrollo de aplicaciones críticas de vuelo, generalmente se utilizan una serie de estándares o recomendaciones de diseño como ABD100, RTCA DO-160, IEC62395, etc, y diferentes técnicas de protección para evitar fallos del tipo SEU o MCU. Estas técnicas están basadas en procesos tecnológicos específicos como memorias robustas, codificaciones para detección y corrección de errores (EDAC), redundancias software, redundancia modular triple (TMR) o soluciones a nivel sistema. Esta tesis está enfocada a minimizar e incluso suprimir los efectos de los SEUs y MCUs que particularmente ocurren en la electrónica de avión como consecuencia de la exposición a radiación de partículas no cargadas (como son los neutrones) que se encuentra potenciada a las típicas alturas de vuelo. La criticidad en vuelo que tienen determinados sistemas obligan a que dichos sistemas sean tolerantes a fallos, es decir, que garanticen un correcto funcionamiento aún cuando se produzca un fallo en ellos. Es por ello que soluciones como las presentadas en esta tesis tienen interés en el sector industrial. La Tesis incluye una descripción inicial de la física de la radiación incidente sobre aeronaves, y el análisis de sus efectos en los componentes electrónicos aeronaúticos basados en semiconductor, que desembocan en la generación de SEUs y MCUs. Este análisis permite dimensionar adecuadamente y optimizar los procedimientos de corrección que se propongan posteriormente. La Tesis propone un sistema de corrección de fallos SEUs y MCUs que permita cumplir la condición de Sistema Tolerante a Fallos, a la vez que minimiza los niveles de redundancia y de complejidad de los códigos de corrección. El nivel de redundancia es minimizado con la introducción del concepto propuesto HSB (Hardwired Seed Bits), en la que se reduce la información esencial a unos pocos bits semilla, neutros frente a radiación. Los códigos de corrección requeridos se reducen a la corrección de un único error, gracias al uso del concepto de Distancia Virtual entre Bits, a partir del cual será posible corregir múltiples errores simultáneos (MCUs) a partir de códigos simples de corrección. Un ejemplo de aplicación de la Tesis es la implementación de una Protección Tolerante a Fallos sobre la memoria SRAM de una FPGA. Esto significa que queda protegida no sólo la información contenida en la memoria sino que también queda auto-protegida la función de protección misma almacenada en la propia SRAM. De esta forma, el sistema es capaz de auto-regenerarse ante un SEU o incluso un MCU, independientemente de la zona de la SRAM sobre la que impacte la radiación. Adicionalmente, esto se consigue con códigos simples tales como corrección por bit de paridad y Hamming, minimizando la dedicación de recursos de computación hacia tareas de supervisión del sistema.For airborne safety critical applications certification, different techniques are implemented to prevent failures in electronic equipments. The HW failures at flying heights of aircrafts related to solar radiation such as SEU (Single-Event-Upset) and MCU (Multiple Bit Upset), causes bits alterations that corrupt the information at memories. These HW failures cause errors, for example, in the Configuration-Code of an FPGA that defines the functionalities. The protection techniques require classically redundant functionalities that increases the cost, components, memory space and weight. During the development phase for airborne safety critical applications, different aerospace standards are generally recommended as ABD100, RTCA-DO160, IEC62395, etc, and different techniques are classically used to avoid failures such as SEU or MCU. These techniques are based on specific technology processes, Hardened memories, error detection and correction codes (EDAC), SW redundancy, Triple Modular Redundancy (TMR) or System level solutions. This Thesis is focussed to minimize, and even to remove, the effects of SEUs and MCUs, that particularly occurs in the airborne electronics as a consequence of its exposition to solar radiation of non-charged particles (for example the neutrons). These non-charged particles are even powered at flying altitudes due to aircraft volume. The safety categorization of different equipments/functionalities requires a design based on fault-tolerant approach that means, the system will continue its normal operation even if a failure occurs. The solution proposed in this Thesis is relevant for the industrial sector because of its Fault-tolerant capability. Thesis includes an initial description for the physics of the solar radiation that affects into aircrafts, and also the analyses of their effects into the airborne electronics based on semiconductor components that create the SEUs and MCUs. This detailed analysis allows the correct sizing and also the optimization of the procedures used to correct the errors. This Thesis proposes a system that corrects the SEUs and MCUs allowing the fulfilment of the Fault-Tolerant requirement, reducing the redundancy resources and also the complexity of the correction codes. The redundancy resources are minimized thanks to the introduction of the concept of HSB (Hardwired Seed Bits), in which the essential information is reduced to a few seed bits, neutral to radiation. The correction codes required are reduced to the correction of one error thanks to the use of the concept of interleaving distance between adjacent bits, this allows the simultaneous multiple error correction with simple single error correcting codes. An example of the application of this Thesis is the implementation of the Fault-tolerant architecture of an SRAM-based FPGA. That means that the information saved in the memory is protected but also the correction functionality is auto protected as well, also saved into SRAM memory. In this way, the system is able to self-regenerate the information lost in case of SEUs or MCUs. This is independent of the SRAM area affected by the radiation. Furthermore, this performance is achieved by means simple error correcting codes, as parity bits or Hamming, that minimize the use of computational resources to this supervision tasks for system.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Luis Alfonso Entrena Arrontes.- Secretario: Pedro Reviriego Vasallo.- Vocal: Mª Luisa López Vallej

    Memory Reliability Enhancement against Multiple Cell Upsets Using Decimal Matrix Code for 32-Bit Data

    Get PDF
    An important issue in the reliability of memories exposed to radiation environment is transient multiple cells upsets (MCUs). To protect the memory data from radiations and transients many improved packaging techniques are available. But, a particular packaging provides protection from a limited variation of radiations. Today the devices are exposed to a very wide range of environment radiations due to increasing applications in the field of wireless communication. So some additional data preservation techniques are always preferred for authenticating the data before it is processed. Some of these techniques use encoded data to be stored in memories. These techniques are error correction codes (ECCs). It is always preferred to implement an error correction code that requires a less number of redundant bits to be stored and a minimized delay overhead in data correction. This paper presents an FPGA based implementation of memory data error detection and correction code that involves simple decimal addition algorithm in the encoding of data that is to be stored in memory. The decoding of the data for error detection and correction is based on the Hamming Code. This technique involves divide-symbol concept to represent the linear data in groups to make symbolic code. The length of the symbol is inversely proportional to the delay overhead of the cod

    Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications

    Full text link
    © 2018 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Currently, faults suffered by SRAM memory systems have increased due to the aggressive CMOS integration density. Thus, the probability of occurrence of single-cell upsets (SCUs) or multiple-cell upsets (MCUs) augments. One of the main causes of MCUs in space applications is cosmic radiation. A common solution is the use of error correction codes (ECCs). Nevertheless, when using ECCs in space applications, they must achieve a good balance between error coverage and redundancy, and their encoding/decoding circuits must be efficient in terms of area, power, and delay. Different codes have been proposed to tolerate MCUs. For instance, Matrix codes use Hamming codes and parity checks in a bi-dimensional layout to correct and detect some patterns of MCUs. Recently presented, column¿line¿code (CLC) has been designed to tolerate MCUs in space applications. CLC is a modified Matrix code, based on extended Hamming codes and parity checks. Nevertheless, a common property of these codes is the high redundancy introduced. In this paper, we present a series of new lowredundant ECCs able to correct MCUs with reduced area, power, and delay overheads. Also, these new codes maintain, or even improve, memory error coverage with respect to Matrix and CLC codes.This work was supported by the Spanish Government under the research Project TIN2016-81075-R.Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26(10):2132-2142. https://doi.org/10.1109/TVLSI.2018.2837220S21322142261

    Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection

    Full text link
    (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Reliable computer systems employ error control codes (ECCs) to protect information from errors. For example, memories are frequently protected using single error correction-double error detection (SEC-DED) codes. ECCs are traditionally designed to minimize the number of redundant bits, as they are added to each word in the whole memory. Nevertheless, using an ECC introduces encoding and decoding latencies, silicon area usage and power consumption. In other computer units, these parameters should be optimized, and redundancy would be less important. For example, protecting registers against errors remains a major concern for deep sub-micron systems due to technology scaling. In this case, an important requirement for register protection is to keep encoding and decoding latencies as short as possible. Ultrafast error control codes achieve very low delays, independently of the word length, increasing the redundancy. This paper summarizes previous works on Ultrafast codes (SEC and SEC-DED), and proposes new codes combining double error detection and adjacent error correction. We have implemented, synthesized and compared different Ultrafast codes with other state-of-the-art fast codes. The results show the validity of the approach, achieving low latencies and a good balance with silicon area and power consumption.This work was supported in part by the Spanish Government under Project TIN2016-81075-R, and in part by the Primeros Proyectos de Investigacion, Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), Valencia, Spain, under Project PAID-06-18 20190032.Saiz-Adalid, L.; Gracia-Morán, J.; Gil Tomás, DA.; Baraza Calvo, JC.; Gil, P. (2019). Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access. 7:151131-151143. https://doi.org/10.1109/ACCESS.2019.2947315S151131151143

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis
    corecore