126 research outputs found

    Memory built-in self-repair and correction for improving yield: a review

    Get PDF
    Nanometer memories are highly prone to defects due to dense structure, necessitating memory built-in self-repair as a must-have feature to improve yield. Today’s system-on-chips contain memories occupying an area as high as 90% of the chip area. Shrinking technology uses stricter design rules for memories, making them more prone to manufacturing defects. Further, using 3D-stacked memories makes the system vulnerable to newer defects such as those coming from through-silicon-vias (TSV) and micro bumps. The increased memory size is also resulting in an increase in soft errors during system operation. Multiple memory repair techniques based on redundancy and correction codes have been presented to recover from such defects and prevent system failures. This paper reviews recently published memory repair methodologies, including various built-in self-repair (BISR) architectures, repair analysis algorithms, in-system repair, and soft repair handling using error correcting codes (ECC). It provides a classification of these techniques based on method and usage. Finally, it reviews evaluation methods used to determine the effectiveness of the repair algorithms. The paper aims to present a survey of these methodologies and prepare a platform for developing repair methods for upcoming-generation memories

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment

    Get PDF
    A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control

    A Multi-layer Fpga Framework Supporting Autonomous Runtime Partial Reconfiguration

    Get PDF
    Partial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip resources. Operations are partitioned into Logic, Translation, and Reconfiguration layers along with a standardized set of Application Programming Interfaces (APIs). At each level, resource details are encapsulated and managed for efficiency and portability during operation. An MRRA mapping theory is developed to link the general logic function and area allocation information to the device related physical configuration level data by using mathematical data structure and physical constraints. In certain scenarios, configuration bit stream data can be read and modified directly for fast operations, relying on the use of similar logic functions and common interconnection resources for communication. A corresponding logic control flow is also developed to make the entire process autonomous. Several prototype MRRA systems are developed on a Xilinx Virtex II Pro platform. The Virtex II Pro on-chip PowerPC core and block RAM are employed to manage control operations while multiple physical interfaces establish and supplement autonomous reconfiguration capabilities. Area, speed and power optimization techniques are developed based on the developed Xilinx prototype. Evaluations and analysis of these prototype and techniques are performed on a number of benchmark and hashing algorithm case studies. The results indicate that based on a variety of test benches, up to 70% reduction in the resource utilization, up to 50% improvement in power consumption, and up to 10 times increase in run-time performance are achieved using the developed architecture and approaches compared with Xilinx baseline reconfiguration flow. Finally, a Genetic Algorithm (GA) for a FPGA fault tolerance case study is evaluated as a ultimate high-level application running on this architecture. It demonstrated that this is a hardware and software infrastructure that enables an FPGA to dynamically reconfigure itself efficiently under the control of a soft microprocessor core that is instantiated within the FPGA fabric. Such a system contributes to the observed benefits of intelligent control, fast reconfiguration, and low overhead

    Low power digital signal processing

    Get PDF

    Parallel Architectures for Many-Core Systems-On-Chip in Deep Sub-Micron Technology

    Get PDF
    Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations

    Conception et test des circuits et systèmes numériques à haute fiabilité et sécurité

    Get PDF
    Research activities I carried on after my nomination as Chargé de Recherche deal with the definition of methodologies and tools for the design, the test and the reliability of secure digital circuits and trustworthy manufacturing. More recently, we have started a new research activity on the test of 3D stacked Integrated CIrcuits, based on the use of Through Silicon Vias. Moreover, thanks to the relationships I have maintained after my post-doc in Italy, I have kept on cooperating with Politecnico di Torino on the topics related to test and reliability of memories and microprocessors.Secure and Trusted DevicesSecurity is a critical part of information and communication technologies and it is the necessary basis for obtaining confidentiality, authentication, and integrity of data. The importance of security is confirmed by the extremely high growth of the smart-card market in the last 20 years. It is reported in "Le monde Informatique" in the article "Computer Crime and Security Survey" in 2007 that financial losses due to attacks on "secure objects" in the digital world are greater than $11 Billions. Since the race among developers of these secure devices and attackers accelerates, also due to the heterogeneity of new systems and their number, the improvement of the resistance of such components becomes today’s major challenge.Concerning all the possible security threats, the vulnerability of electronic devices that implement cryptography functions (including smart cards, electronic passports) has become the Achille’s heel in the last decade. Indeed, even though recent crypto-algorithms have been proven resistant to cryptanalysis, certain fraudulent manipulations on the hardware implementing such algorithms can allow extracting confidential information. So-called Side-Channel Attacks have been the first type of attacks that target the physical device. They are based on information gathered from the physical implementation of a cryptosystem. For instance, by correlating the power consumed and the data manipulated by the device, it is possible to discover the secret encryption key. Nevertheless, this point is widely addressed and integrated circuit (IC) manufacturers have already developed different kinds of countermeasures.More recently, new threats have menaced secure devices and the security of the manufacturing process. A first issue is the trustworthiness of the manufacturing process. From one side, secure devices must assure a very high production quality in order not to leak confidential information due to a malfunctioning of the device. Therefore, possible defects due to manufacturing imperfections must be detected. This requires high-quality test procedures that rely on the use of test features that increases the controllability and the observability of inner points of the circuit. Unfortunately, this is harmful from a security point of view, and therefore the access to these test features must be protected from unauthorized users. Another harm is related to the possibility for an untrusted manufacturer to do malicious alterations to the design (for instance to bypass or to disable the security fence of the system). Nowadays, many steps of the production cycle of a circuit are outsourced. For economic reasons, the manufacturing process is often carried out by foundries located in foreign countries. The threat brought by so-called Hardware Trojan Horses, which was long considered theoretical, begins to materialize.A second issue is the hazard of faults that can appear during the circuit’s lifetime and that may affect the circuit behavior by way of soft errors or deliberate manipulations, called Fault Attacks. They can be based on the intentional modification of the circuit’s environment (e.g., applying extreme temperature, exposing the IC to radiation, X-rays, ultra-violet or visible light, or tampering with clock frequency) in such a way that the function implemented by the device generates an erroneous result. The attacker can discover secret information by comparing the erroneous result with the correct one. In-the-field detection of any failing behavior is therefore of prime interest for taking further action, such as discontinuing operation or triggering an alarm. In addition, today’s smart cards use 90nm technology and according to the various suppliers of chip, 65nm technology will be effective on the horizon 2013-2014. Since the energy required to force a transistor to switch is reduced for these new technologies, next-generation secure systems will become even more sensitive to various classes of fault attacks.Based on these considerations, within the group I work with, we have proposed new methods, architectures and tools to solve the following problems:• Test of secure devices: unfortunately, classical techniques for digital circuit testing cannot be easily used in this context. Indeed, classical testing solutions are based on the use of Design-For-Testability techniques that add hardware components to the circuit, aiming to provide full controllability and observability of internal states. Because crypto‐ processors and others cores in a secure system must pass through high‐quality test procedures to ensure that data are correctly processed, testing of crypto chips faces a dilemma. In fact design‐for‐testability schemes want to provide high controllability and observability of the device while security wants minimal controllability and observability in order to hide the secret. We have therefore proposed, form one side, the use of enhanced scan-based test techniques that exploit compaction schemes to reduce the observability of internal information while preserving the high level of testability. From the other side, we have proposed the use of Built-In Self-Test for such devices in order to avoid scan chain based test.• Reliability of secure devices: we proposed an on-line self-test architecture for hardware implementation of the Advanced Encryption Standard (AES). The solution exploits the inherent spatial replications of a parallel architecture for implementing functional redundancy at low cost.• Fault Attacks: one of the most powerful types of attack for secure devices is based on the intentional injection of faults (for instance by using a laser beam) into the system while an encryption occurs. By comparing the outputs of the circuits with and without the injection of the fault, it is possible to identify the secret key. To face this problem we have analyzed how to use error detection and correction codes as counter measure against this type of attack, and we have proposed a new code-based architecture. Moreover, we have proposed a bulk built-in current-sensor that allows detecting the presence of undesired current in the substrate of the CMOS device.• Fault simulation: to evaluate the effectiveness of countermeasures against fault attacks, we developed an open source fault simulator able to perform fault simulation for the most classical fault models as well as user-defined electrical level fault models, to accurately model the effect of laser injections on CMOS circuits.• Side-Channel attacks: they exploit physical data-related information leaking from the device (e.g. current consumption or electro-magnetic emission). One of the most intensively studied attacks is the Differential Power Analysis (DPA) that relies on the observation of the chip power fluctuations during data processing. I studied this type of attack in order to evaluate the influence of the countermeasures against fault attack on the power consumption of the device. Indeed, the introduction of countermeasures for one type of attack could lead to the insertion of some circuitry whose power consumption is related to the secret key, thus allowing another type of attack more easily. We have developed a flexible integrated simulation-based environment that allows validating a digital circuit when the device is attacked by means of this attack. All architectures we designed have been validated through this tool. Moreover, we developed a methodology that allows to drastically reduce the time required to validate countermeasures against this type of attack.TSV- based 3D Stacked Integrated Circuits TestThe stacking process of integrated circuits using TSVs (Through Silicon Via) is a promising technology that keeps the development of the integration more than Moore’s law, where TSVs enable to tightly integrate various dies in a 3D fashion. Nevertheless, 3D integrated circuits present many test challenges including the test at different levels of the 3D fabrication process: pre-, mid-, and post- bond tests. Pre-bond test targets the individual dies at wafer level, by testing not only classical logic (digital logic, IOs, RAM, etc) but also unbounded TSVs. Mid-bond test targets the test of partially assembled 3D stacks, whereas finally post-bond test targets the final circuit.The activities carried out within this topic cover 2 main issues:• Pre-bond test of TSVs: the electrical model of a TSV buried within the substrate of a CMOS circuit is a capacitance connected to ground (when the substrate is connected to ground). The main assumption is that a defect may affect the value of that capacitance. By measuring the variation of the capacitance’s value it is possible to check whether the TSV is correctly fabricated or not. We have proposed a method to measure the value of the capacitance based on the charge/ discharge delay of the RC network containing the TSV.• Test infrastructures for 3D stacked Integrated Circuits: testing a die before stacking to another die introduces the problem of a dynamic test infrastructure, where test data must be routed to a specific die based on the reached fabrication step. New solutions are proposed in literature that allow reconfiguring the test paths within the circuit, based on on-the-fly requirements. We have started working on an extension of the IEEE P1687 test standard that makes use of an automatic die-detection based on pull-up resistors.Memory and Microprocessor Test and ReliabilityThanks to device shrinking and miniaturization of fabrication technology, performances of microprocessors and of memories have grown of more than 5 magnitude order in the last 30 years. With this technology trend, it is necessary to face new problems and challenges, such as reliability, transient errors, variability and aging.In the last five years I’ve worked in cooperation with the Testgroup of Politecnico di Torino (Italy) to propose a new method to on-line validate the correctness of the program execution of a microprocessor. The main idea is to monitor a small set of control signals of the processors in order to identify incorrect activation sequences. This approach can detect both permanent and transient errors of the internal logic of the processor.Concerning the test of memories, we have proposed a new approach to automatically generate test programs starting from a functional description of the possible faults in the memory.Moreover, we proposed a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success
    corecore