32 research outputs found

    CROSS-LAYER DESIGN, OPTIMIZATION AND PROTOTYPING OF NoCs FOR THE NEXT GENERATION OF HOMOGENEOUS MANY-CORE SYSTEMS

    Get PDF
    This thesis provides a whole set of design methods to enable and manage the runtime heterogeneity of features-rich industry-ready Tile-Based Networkon- Chips at different abstraction layers (Architecture Design, Network Assembling, Testing of NoC, Runtime Operation). The key idea is to maintain the functionalities of the original layers, and to improve the performance of architectures by allowing, joint optimization and layer coordinations. In general purpose systems, we address the microarchitectural challenges by codesigning and co-optimizing feature-rich architectures. In application-specific NoCs, we emphasize the event notification, so that the platform is continuously under control. At the network assembly level, this thesis proposes a Hold Time Robustness technique, to tackle the hold time issue in synchronous NoCs. At the network architectural level, the choice of a suitable synchronization paradigm requires a boost of synthesis flow as well as the coexistence with the DVFS. On one hand this implies the coexistence of mesochronous synchronizers in the network with dual-clock FIFOs at network boundaries. On the other hand, dual-clock FIFOs may be placed across inter-switch links hence removing the need for mesochronous synchronizers. This thesis will study the implications of the above approaches both on the design flow and on the performance and power quality metrics of the network. Once the manycore system is composed together, the issue of testing it arises. This thesis takes on this challenge and engineers various testing infrastructures. At the upper abstraction layer, the thesis addresses the issue of managing the fully operational system and proposes a congestion management technique named HACS. Moreover, some of the ideas of this thesis will undergo an FPGA prototyping. Finally, we provide some features for emerging technology by characterizing the power consumption of Optical NoC Interfaces

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Conception et test des circuits et systèmes numériques à haute fiabilité et sécurité

    Get PDF
    Research activities I carried on after my nomination as Chargé de Recherche deal with the definition of methodologies and tools for the design, the test and the reliability of secure digital circuits and trustworthy manufacturing. More recently, we have started a new research activity on the test of 3D stacked Integrated CIrcuits, based on the use of Through Silicon Vias. Moreover, thanks to the relationships I have maintained after my post-doc in Italy, I have kept on cooperating with Politecnico di Torino on the topics related to test and reliability of memories and microprocessors.Secure and Trusted DevicesSecurity is a critical part of information and communication technologies and it is the necessary basis for obtaining confidentiality, authentication, and integrity of data. The importance of security is confirmed by the extremely high growth of the smart-card market in the last 20 years. It is reported in "Le monde Informatique" in the article "Computer Crime and Security Survey" in 2007 that financial losses due to attacks on "secure objects" in the digital world are greater than $11 Billions. Since the race among developers of these secure devices and attackers accelerates, also due to the heterogeneity of new systems and their number, the improvement of the resistance of such components becomes today’s major challenge.Concerning all the possible security threats, the vulnerability of electronic devices that implement cryptography functions (including smart cards, electronic passports) has become the Achille’s heel in the last decade. Indeed, even though recent crypto-algorithms have been proven resistant to cryptanalysis, certain fraudulent manipulations on the hardware implementing such algorithms can allow extracting confidential information. So-called Side-Channel Attacks have been the first type of attacks that target the physical device. They are based on information gathered from the physical implementation of a cryptosystem. For instance, by correlating the power consumed and the data manipulated by the device, it is possible to discover the secret encryption key. Nevertheless, this point is widely addressed and integrated circuit (IC) manufacturers have already developed different kinds of countermeasures.More recently, new threats have menaced secure devices and the security of the manufacturing process. A first issue is the trustworthiness of the manufacturing process. From one side, secure devices must assure a very high production quality in order not to leak confidential information due to a malfunctioning of the device. Therefore, possible defects due to manufacturing imperfections must be detected. This requires high-quality test procedures that rely on the use of test features that increases the controllability and the observability of inner points of the circuit. Unfortunately, this is harmful from a security point of view, and therefore the access to these test features must be protected from unauthorized users. Another harm is related to the possibility for an untrusted manufacturer to do malicious alterations to the design (for instance to bypass or to disable the security fence of the system). Nowadays, many steps of the production cycle of a circuit are outsourced. For economic reasons, the manufacturing process is often carried out by foundries located in foreign countries. The threat brought by so-called Hardware Trojan Horses, which was long considered theoretical, begins to materialize.A second issue is the hazard of faults that can appear during the circuit’s lifetime and that may affect the circuit behavior by way of soft errors or deliberate manipulations, called Fault Attacks. They can be based on the intentional modification of the circuit’s environment (e.g., applying extreme temperature, exposing the IC to radiation, X-rays, ultra-violet or visible light, or tampering with clock frequency) in such a way that the function implemented by the device generates an erroneous result. The attacker can discover secret information by comparing the erroneous result with the correct one. In-the-field detection of any failing behavior is therefore of prime interest for taking further action, such as discontinuing operation or triggering an alarm. In addition, today’s smart cards use 90nm technology and according to the various suppliers of chip, 65nm technology will be effective on the horizon 2013-2014. Since the energy required to force a transistor to switch is reduced for these new technologies, next-generation secure systems will become even more sensitive to various classes of fault attacks.Based on these considerations, within the group I work with, we have proposed new methods, architectures and tools to solve the following problems:• Test of secure devices: unfortunately, classical techniques for digital circuit testing cannot be easily used in this context. Indeed, classical testing solutions are based on the use of Design-For-Testability techniques that add hardware components to the circuit, aiming to provide full controllability and observability of internal states. Because crypto‐ processors and others cores in a secure system must pass through high‐quality test procedures to ensure that data are correctly processed, testing of crypto chips faces a dilemma. In fact design‐for‐testability schemes want to provide high controllability and observability of the device while security wants minimal controllability and observability in order to hide the secret. We have therefore proposed, form one side, the use of enhanced scan-based test techniques that exploit compaction schemes to reduce the observability of internal information while preserving the high level of testability. From the other side, we have proposed the use of Built-In Self-Test for such devices in order to avoid scan chain based test.• Reliability of secure devices: we proposed an on-line self-test architecture for hardware implementation of the Advanced Encryption Standard (AES). The solution exploits the inherent spatial replications of a parallel architecture for implementing functional redundancy at low cost.• Fault Attacks: one of the most powerful types of attack for secure devices is based on the intentional injection of faults (for instance by using a laser beam) into the system while an encryption occurs. By comparing the outputs of the circuits with and without the injection of the fault, it is possible to identify the secret key. To face this problem we have analyzed how to use error detection and correction codes as counter measure against this type of attack, and we have proposed a new code-based architecture. Moreover, we have proposed a bulk built-in current-sensor that allows detecting the presence of undesired current in the substrate of the CMOS device.• Fault simulation: to evaluate the effectiveness of countermeasures against fault attacks, we developed an open source fault simulator able to perform fault simulation for the most classical fault models as well as user-defined electrical level fault models, to accurately model the effect of laser injections on CMOS circuits.• Side-Channel attacks: they exploit physical data-related information leaking from the device (e.g. current consumption or electro-magnetic emission). One of the most intensively studied attacks is the Differential Power Analysis (DPA) that relies on the observation of the chip power fluctuations during data processing. I studied this type of attack in order to evaluate the influence of the countermeasures against fault attack on the power consumption of the device. Indeed, the introduction of countermeasures for one type of attack could lead to the insertion of some circuitry whose power consumption is related to the secret key, thus allowing another type of attack more easily. We have developed a flexible integrated simulation-based environment that allows validating a digital circuit when the device is attacked by means of this attack. All architectures we designed have been validated through this tool. Moreover, we developed a methodology that allows to drastically reduce the time required to validate countermeasures against this type of attack.TSV- based 3D Stacked Integrated Circuits TestThe stacking process of integrated circuits using TSVs (Through Silicon Via) is a promising technology that keeps the development of the integration more than Moore’s law, where TSVs enable to tightly integrate various dies in a 3D fashion. Nevertheless, 3D integrated circuits present many test challenges including the test at different levels of the 3D fabrication process: pre-, mid-, and post- bond tests. Pre-bond test targets the individual dies at wafer level, by testing not only classical logic (digital logic, IOs, RAM, etc) but also unbounded TSVs. Mid-bond test targets the test of partially assembled 3D stacks, whereas finally post-bond test targets the final circuit.The activities carried out within this topic cover 2 main issues:• Pre-bond test of TSVs: the electrical model of a TSV buried within the substrate of a CMOS circuit is a capacitance connected to ground (when the substrate is connected to ground). The main assumption is that a defect may affect the value of that capacitance. By measuring the variation of the capacitance’s value it is possible to check whether the TSV is correctly fabricated or not. We have proposed a method to measure the value of the capacitance based on the charge/ discharge delay of the RC network containing the TSV.• Test infrastructures for 3D stacked Integrated Circuits: testing a die before stacking to another die introduces the problem of a dynamic test infrastructure, where test data must be routed to a specific die based on the reached fabrication step. New solutions are proposed in literature that allow reconfiguring the test paths within the circuit, based on on-the-fly requirements. We have started working on an extension of the IEEE P1687 test standard that makes use of an automatic die-detection based on pull-up resistors.Memory and Microprocessor Test and ReliabilityThanks to device shrinking and miniaturization of fabrication technology, performances of microprocessors and of memories have grown of more than 5 magnitude order in the last 30 years. With this technology trend, it is necessary to face new problems and challenges, such as reliability, transient errors, variability and aging.In the last five years I’ve worked in cooperation with the Testgroup of Politecnico di Torino (Italy) to propose a new method to on-line validate the correctness of the program execution of a microprocessor. The main idea is to monitor a small set of control signals of the processors in order to identify incorrect activation sequences. This approach can detect both permanent and transient errors of the internal logic of the processor.Concerning the test of memories, we have proposed a new approach to automatically generate test programs starting from a functional description of the possible faults in the memory.Moreover, we proposed a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success

    Improvement of hardware reliability with aging monitors

    Get PDF

    Innovative Techniques for Testing and Diagnosing SoCs

    Get PDF
    We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

    Machine learning support for logic diagnosis

    Get PDF

    Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems

    Get PDF
    NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the design principles to efficiently implement interconnection networks in the resource-constrained on-chip setting have stabilized. On the other hand, the requirements on embedded system design are far from stabilizing. Embedded systems are composed by assembling together heterogeneous components featuring differentiated operating speeds and ad-hoc counter measures must be adopted to bridge frequency domains. Moreover, an unmistakable trend toward enhanced reconfigurability is clearly underway due to the increasing complexity of applications. At the same time, the technology effect is manyfold since it provides unprecedented levels of system integration but it also brings new severe constraints to the forefront: power budget restrictions, overheating concerns, circuit delay and power variability, permanent fault, increased probability of transient faults. Supporting different degrees of reconfigurability and flexibility in the parallel hardware platform cannot be however achieved with the incremental evolution of current design techniques, but requires a disruptive approach and a major increase in complexity. In addition, new reliability challenges cannot be solved by using traditional fault tolerance techniques alone but the reliability approach must be also part of the overall reconfiguration methodology. In this thesis we take on the challenge of engineering a NoC architectures for the next generation systems and we provide design methods able to overcome the conventional way of implementing multi-synchronous, reliable and reconfigurable NoC. Our analysis is not only limited to research novel approaches to the specific challenges of the NoC architecture but we also co-design the solutions in a single integrated framework. Interdependencies between different NoC features are detected ahead of time and we finally avoid the engineering of highly optimized solutions to specific problems that however coexist inefficiently together in the final NoC architecture. To conclude, a silicon implementation by means of a testchip tape-out and a prototype on a FPGA board validate the feasibility and effectivenes

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators

    Reliable Design of Three-Dimensional Integrated Circuits

    Get PDF
    corecore