135 research outputs found

    Robust Design of Variation-Sensitive Digital Circuits

    Get PDF
    The nano-age has already begun, where typical feature dimensions are smaller than 100nm. The operating frequency is expected to increase up to 12 GHz, and a single chip will contain over 12 billion transistors in 2020, as given by the International Technology Roadmap for Semiconductors (ITRS) initiative. ITRS also predicts that the scaling of CMOS devices and process technology, as it is known today, will become much more difficult as the industry advances towards the 16nm technology node and further. This aggressive scaling of CMOS technology has pushed the devices to their physical limits. Design goals are governed by several factors other than power, performance and area such as process variations, radiation induced soft errors, and aging degradation mechanisms. These new design challenges have a strong impact on the parametric yield of nanometer digital circuits and also result in functional yield losses in variation-sensitive digital circuits such as Static Random Access Memory (SRAM) and flip-flops. Moreover, sub-threshold SRAM and flip-flops circuits, which are aggravated by the strong demand for lower power consumption, show larger sensitivity to these challenges which reduces their robustness and yield. Accordingly, it is not surprising that the ITRS considers variability and reliability as the most challenging obstacles for nanometer digital circuits robust design. Soft errors are considered one of the main reliability and robustness concerns in SRAM arrays in sub-100nm technologies due to low operating voltage, small node capacitance, and high packing density. The SRAM arrays soft errors immunity is also affected by process variations. We develop statistical design-oriented soft errors immunity variations models for super-threshold and sub-threshold SRAM cells accounting for die-to-die variations and within-die variations. This work provides new design insights and highlights the important design knobs that can be used to reduce the SRAM cells soft errors immunity variations. The developed models are scalable, bias dependent, and only require the knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit optimization as well as technology prediction. The derived models are verified using Monte Carlo SPICE simulations, referring to an industrial hardware-calibrated 65nm CMOS technology. The demand for higher performance leads to very deep pipelining which means that hundreds of thousands of flip-flops are required to control the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop can result in latching incorrect data causing the overall system to malfunction. In addition, the flip-flops power dissipation represents a considerable fraction of the total power dissipation. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of secondary importance. Accordingly, statistical gate sizing is conducted to different flip-flops topologies for timing yield improvement of super-threshold flip-flops and power yield improvement of sub-threshold flip-flops. Following that, a comparative analysis between these flip-flops topologies considering the required overhead for yield improvement is performed. This comparative analysis provides useful recommendations that help flip-flops designers on selecting the best flip-flops topology that satisfies their system specifications while taking the process variations impact and robustness requirements into account. Adaptive Body Bias (ABB) allows the tuning of the transistor threshold voltage, Vt, by controlling the transistor body voltage. A forward body bias reduces Vt, increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias increases Vt, reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and less leaky devices or slowing down devices that are fast and highly leaky. Practically, the implementation of the ABB is desirable to bias each device in a design independently, to mitigate within-die variations. However, supplying so many separate voltages inside a die results in a large area overhead. On the other hand, using the same body bias for all devices on the same die limits its capability to compensate for within-die variations. Thus, the granularity level of the ABB scheme is a trade-off between the within-die variations compensation capability and the associated area overhead. This work introduces new ABB circuits that exhibit lower area overhead by a factor of 143X than that of previous ABB circuits. In addition, these ABB circuits are resolution free since no digital-to-analog converters or analog-to-digital converters are required on their implementations. These ABB circuits are adopted to high performance critical paths, emulating a real microprocessor architecture, for process variations compensation and also adopted to SRAM arrays, for Negative Bias Temperature Instability (NBTI) aging and process variations compensation. The effectiveness of the new ABB circuits is verified by post layout simulation results and test chip measurements using triple-well 65nm CMOS technology. The highly capacitive nodes of wide fan-in dynamic circuits and SRAM bitlines limit the performance of these circuits. In addition, process variations mitigation by statistical gate sizing increases this capacitance further and fails in achieving the target yield improvement. We propose new negative capacitance circuits that reduce the overall parasitic capacitance of these highly capacitive nodes. These negative capacitance circuits are adopted to wide fan-in dynamic circuits for timing yield improvement up to 99.87% and to SRAM arrays for read access yield improvement up to 100%. The area and power overheads of these new negative capacitance circuits are amortized over the large die area of the microprocessor and the SRAM array. The effectiveness of the new negative capacitance circuits is verified by post layout simulation results and test chip measurements using 65nm CMOS technology

    Resilient Design for Process and Runtime Variations

    Get PDF
    The main objective of this thesis is to tackle the impact of parameter variations in order to improve the chip performance and extend its lifetime

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

    Runtime Monitoring for Dependable Hardware Design

    Get PDF
    Mit dem Voranschreiten der Technologieskalierung und der Globalisierung der Produktion von integrierten Schaltkreisen eröffnen sich eine Fülle von Schwachstellen bezüglich der Verlässlichkeit von Computerhardware. Jeder Mikrochip wird aufgrund von Produktionsschwankungen mit einem einzigartigen Charakter geboren, welcher sich durch seine Arbeitsbedingungen, Belastung und Umgebung in individueller Weise entwickelt. Daher sind deterministische Modelle, welche zur Entwurfszeit die Verlässlichkeit prognostizieren, nicht mehr ausreichend um Integrierte Schaltkreise mit Nanometertechnologie sinnvoll abbilden zu können. Der Bedarf einer Laufzeitanalyse des Zustandes steigt und mit ihm die notwendigen Maßnahmen zum Erhalt der Zuverlässigkeit. Transistoren sind anfällig für auslastungsbedingte Alterung, die die Laufzeit der Schaltung erhöht und mit ihr die Möglichkeit einer Fehlberechnung. Hinzu kommen spezielle Abläufe die das schnelle Altern des Chips befördern und somit seine zuverlässige Lebenszeit reduzieren. Zusätzlich können strahlungsbedingte Laufzeitfehler (Soft-Errors) des Chips abnormales Verhalten kritischer Systeme verursachen. Sowohl das Ausbreiten als auch das Maskieren dieser Fehler wiederum sind abhängig von der Arbeitslast des Systems. Fabrizierten Chips können ebenfalls vorsätzlich während der Produktion boshafte Schaltungen, sogenannte Hardwaretrojaner, hinzugefügt werden. Dies kompromittiert die Sicherheit des Chips. Da diese Art der Manipulation vor ihrer Aktivierung kaum zu erfassen ist, ist der Nachweis von Trojanern auf einem Chip direkt nach der Produktion extrem schwierig. Die Komplexität dieser Verlässlichkeitsprobleme machen ein einfaches Modellieren der Zuverlässigkeit und Gegenmaßnahmen ineffizient. Sie entsteht aufgrund verschiedener Quellen, eingeschlossen der Entwicklungsparameter (Technologie, Gerät, Schaltung und Architektur), der Herstellungsparameter, der Laufzeitauslastung und der Arbeitsumgebung. Dies motiviert das Erforschen von maschinellem Lernen und Laufzeitmethoden, welche potentiell mit dieser Komplexität arbeiten können. In dieser Arbeit stellen wir Lösungen vor, die in der Lage sind, eine verlässliche Ausführung von Computerhardware mit unterschiedlichem Laufzeitverhalten und Arbeitsbedingungen zu gewährleisten. Wir entwickelten Techniken des maschinellen Lernens um verschiedene Zuverlässigkeitseffekte zu modellieren, zu überwachen und auszugleichen. Verschiedene Lernmethoden werden genutzt, um günstige Überwachungspunkte zur Kontrolle der Arbeitsbelastung zu finden. Diese werden zusammen mit Zuverlässigkeitsmetriken, aufbauend auf Ausfallsicherheit und generellen Sicherheitsattributen, zum Erstellen von Vorhersagemodellen genutzt. Des Weiteren präsentieren wir eine kosten-optimierte Hardwaremonitorschaltung, welche die Überwachungspunkte zur Laufzeit auswertet. Im Gegensatz zum aktuellen Stand der Technik, welcher mikroarchitektonische Überwachungspunkte ausnutzt, evaluieren wir das Potential von Arbeitsbelastungscharakteristiken auf der Logikebene der zugrundeliegenden Hardware. Wir identifizieren verbesserte Features auf Logikebene um feingranulare Laufzeitüberwachung zu ermöglichen. Diese Logikanalyse wiederum hat verschiedene Stellschrauben um auf höhere Genauigkeit und niedrigeren Overhead zu optimieren. Wir untersuchten die Philosophie, Überwachungspunkte auf Logikebene mit Hilfe von Lernmethoden zu identifizieren und günstigen Monitore zu implementieren um eine adaptive Vorbeugung gegen statisches Altern, dynamisches Altern und strahlungsinduzierte Soft-Errors zu schaffen und zusätzlich die Aktivierung von Hardwaretrojanern zu erkennen. Diesbezüglich haben wir ein Vorhersagemodell entworfen, welches den Arbeitslasteinfluss auf alterungsbedingte Verschlechterungen des Chips mitverfolgt und dazu genutzt werden kann, dynamisch zur Laufzeit vorbeugende Techniken, wie Task-Mitigation, Spannungs- und Frequenzskalierung zu benutzen. Dieses Vorhersagemodell wurde in Software implementiert, welche verschiedene Arbeitslasten aufgrund ihrer Alterungswirkung einordnet. Um die Widerstandsfähigkeit gegenüber beschleunigter Alterung sicherzustellen, stellen wir eine Überwachungshardware vor, welche einen Teil der kritischen Flip-Flops beaufsichtigt, nach beschleunigter Alterung Ausschau hält und davor warnt, wenn ein zeitkritischer Pfad unter starker Alterungsbelastung steht. Wir geben die Implementierung einer Technik zum Reduzieren der durch das Ausführen spezifischer Subroutinen auftretenden Belastung von zeitkritischen Pfaden. Zusätzlich schlagen wir eine Technik zur Abschätzung von online Soft-Error-Schwachstellen von Speicherarrays und Logikkernen vor, welche auf der Überwachung einer kleinen Gruppe Flip-Flops des Entwurfs basiert. Des Weiteren haben wir eine Methode basierend auf Anomalieerkennung entwickelt, um Arbeitslastsignaturen von Hardwaretrojanern während deren Aktivierung zur Laufzeit zu erkennen und somit eine letzte Verteidigungslinie zu bilden. Basierend auf diesen Experimenten demonstriert diese Arbeit das Potential von fortgeschrittener Feature-Extraktion auf Logikebene und lernbasierter Vorhersage basierend auf Laufzeitdaten zur Verbesserung der Zuverlässigkeit von Harwareentwürfen

    Techniques for Aging, Soft Errors and Temperature to Increase the Reliability of Embedded On-Chip Systems

    Get PDF
    This thesis investigates the challenge of providing an abstracted, yet sufficiently accurate reliability estimation for embedded on-chip systems. In addition, it also proposes new techniques to increase the reliability of register files within processors against aging effects and soft errors. It also introduces a novel thermal measurement setup that perspicuously captures the infrared images of modern multi-core processors

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    A hardware-embedded, delay-based PUF engine designed for use in cryptographic and authentication applications

    Get PDF
    Cryptographic and authentication applications in application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs), as well as codes for the activation of on-chip features, require the use of embedded secret information. The generation of secret bitstrings using physical unclonable functions, or PUFs, provides several distinct advantages over conventional methods, including the elimination of costly non-volatile memory, and the potential to increase the random bits available to applications. In this dissertation, a Hardware-Embedded Delay PUF (HELP) is proposed that is designed to leverage path delay variations that occur in the core logic macros of a chip to create random bitstrings. A thorough discussion is provided of the operational details of an embedded path timing structure called REBEL that is used by HELP to provide the timing functionality upon which HELP relies for the entropy source for the cryptographic quality of the bitstrings. Further details of the FPGA-based implementation used to prove the viability of the HELP PUF concept are included, along with a discussion of the evolution of the techniques employed in realizing the final PUF engine design. The bitstrings produced by a set of 30 FPGA boards are evaluated with regard to several statistical quality metrics including uniqueness, randomness, and stability. The stability characteristics of the bitstrings are evaluated by subjecting the FPGAs to commercial-grade temperature and power supply voltage variations. In particular, this work evaluates the reproducibility of the bitstrings generated at 0C, 25C, and 70C, and 10% of the rated supply voltage. A pair of error avoidance schemes are proposed and presented that provide significant improvements to the HELP PUF\u27s resiliency against bit-flip errors in the bitstrings

    Pertanika Journal of Science & Technology

    Get PDF
    corecore