412 research outputs found

    Machine Learning to Tackle the Challenges of Transient and Soft Errors in Complex Circuits

    Full text link
    The Functional Failure Rate analysis of today's complex circuits is a difficult task and requires a significant investment in terms of human efforts, processing resources and tool licenses. Thereby, de-rating or vulnerability factors are a major instrument of failure analysis efforts. Usually computationally intensive fault-injection simulation campaigns are required to obtain a fine-grained reliability metrics for the functional level. Therefore, the use of machine learning algorithms to assist this procedure and thus, optimising and enhancing fault injection efforts, is investigated in this paper. Specifically, machine learning models are used to predict accurate per-instance Functional De-Rating data for the full list of circuit instances, an objective that is difficult to reach using classical methods. The described methodology uses a set of per-instance features, extracted through an analysis approach, combining static elements (cell properties, circuit structure, synthesis attributes) and dynamic elements (signal activity). Reference data is obtained through first-principles fault simulation approaches. One part of this reference dataset is used to train the machine learning model and the remaining is used to validate and benchmark the accuracy of the trained tool. The presented methodology is applied on a practical example and various machine learning models are evaluated and compared

    Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs

    Get PDF
    The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies. The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification

    Within-Die Delay Variation Measurement And Analysis For Emerging Technologies Using An Embedded Test Structure

    Get PDF
    Both random and systematic within-die process variations (PV) are growing more severe with shrinking geometries and increasing die size. Escalation in the variations in delay and power with reductions in feature size places higher demands on the accuracy of variation models. Their availability can be used to improve yield, and the corresponding profitability and product quality of the fabricated integrated circuits (ICs). Sources of within-die variations include optical source limitations, and layout-based systematic effects (pitch, line-width variability, and microscopic etch loading). Unfortunately, accurate models of within-die PVs are becoming more difficult to derive because of their increasingly sensitivity to design-context. Embedded test structures (ETS) continue to play an important role in the development of models of PVs and as a mechanism to improve correlations between hardware and models. Variations in path delays are increasing with scaling, and are increasingly affected by neighborhood\u27 interactions. In order to fully characterize within-die variations, delays must be measured in the context of actual core-logic macros. Doing so requires the use of an embedded test structure, as opposed to traditional scribe line test structures such as ring oscillators (RO). Accurate measurements of within-die variations can be used, e.g., to better tune models to actual hardware (model-to-hardware correlations). In this research project, I propose an embedded test structure called REBEL (Regional dELay BEhavior) that is designed to measure path delays in a minimally invasive fashion; and its architecture measures the path delays more accurately. Design for manufacture-ability (DFM) analysis is done on the on 90 nm ASIC chips and 28nm Zynq 7000 series FPGA boards. I present ASIC results on within-die path delay variations in a floating-point unit (FPU) fabricated in IBM\u27s 90 nm technology, with 5 pipeline stages, used as a test vehicle in chip experiments carried out at nine different temperature/voltage (TV) corners. Also experimental data has been analyzed for path delay variations in short vs long paths. FPGA results on within-die variation and die-to-die variations on Advanced Encryption System (AES) using single pipelined stage are also presented. Other analysis that have been performed on the calibrated path delays are Flip Flop propagation delays for both rising and falling edge (tpHL and tpLH), uncertainty analysis, path distribution analysis, short versus long path variations and mid-length path within-die variation. I also analyze the impact on delay when the chips are subjected to industrial-level temperature and voltage variations. From the experimental results, it has been established that the proposed REBEL provides capabilities similar to an off-chip logic analyzer, i.e., it is able to capture the temporal behavior of the signal over time, including any static and dynamic hazards that may occur on the tested path. The ASIC results further show that path delays are correlated to the launch-capture (LC) interval used to time them. Therefore, calibration as proposed in this work must be carried out in order to obtain an accurate analysis of within-die variations. Results on ASIC chips show that short paths can vary up to 35% on average, while long paths vary up to 20% at nominal temperature and voltage. A similar trend occurs for within-die variations of mid-length paths where magnitudes reduced to 20% and 5%, respectively. The magnitude of delay variations in both these analyses increase as temperature and voltage are changed to increase performance. The high level of within-die delay variations are undesirable from a design perspective, but they represent a rich source of entropy for applications that make use of \u27secrets\u27 such as authentication, hardware metering and encryption. Physical unclonable functions (PUFs) are a class of primitives that leverage within-die-variations as a means of generating random bit strings for these types of applications, including hardware security and trust. Zynq FPGAs Die-to-Die and within-die variation study shows that on average there is 5% of within-Die variation and the range of die-to-Die variation can go upto 3ns. The die-to-Die variations can be explored in much further detail to study the variations spatial dependance. Additionally, I also carried out research in the area data mining to cater for big data by focusing the work on decision tree classification (DTC) to speed-up the classification step in hardware implementation. For this purpose, I devised a pipelined architecture for the implementation of axis parallel binary decision tree classification for meeting up with the requirements of execution time and minimal resource usage in terms of area. The motivation for this work is that analyzing larger data-sets have created abundant opportunities for algorithmic and architectural developments, and data-mining innovations, thus creating a great demand for faster execution of these algorithms, leading towards improving execution time and resource utilization. Decision trees (DT) have since been implemented in software programs. Though, the software implementation of DTC is highly accurate, the execution times and the resource utilization still require improvement to meet the computational demands in the ever growing industry. On the other hand, hardware implementation of DT has not been thoroughly investigated or reported in detail. Therefore, I propose a hardware acceleration of pipelined architecture that incorporates the parallel approach in acquiring the data by having parallel engines working on different partitions of data independently. Also, each engine is processing the data in a pipelined fashion to utilize the resources more efficiently and reduce the time for processing all the data records/tuples. Experimental results show that our proposed hardware acceleration of classification algorithms has increased throughput, by reducing the number of clock cycles required to process the data and generate the results, and it requires minimal resources hence it is area efficient. This architecture also enables algorithms to scale with increasingly large and complex data sets. We developed the DTC algorithm in detail and explored techniques for adapting it to a hardware implementation successfully. This system is 3.5 times faster than the existing hardware implementation of classification.\u2

    Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

    Get PDF
    Rapid developments of the AI domain has revolutionized the computing industry by the introduction of state-of-art AI architectures. This growth is also accompanied by a massive increase in the power consumption. Near-Theshold Computing (NTC) has emerged as a viable solution by offering significant savings in power consumption paving the way for an energy efficient design paradigm. However, these benefits are accompanied by a deterioration in performance due to the severe process variation and slower transistor switching at Near-Threshold operation. These problems severely restrict the usage of Near-Threshold operation in commercial applications. In this work, a novel AI architecture, Tensor Processing Unit, operating at NTC is thoroughly investigated to tackle the issues hindering system performance. Research problems are demonstrated in a scientific manner and unique opportunities are explored to propose novel design methodologies

    Ingress of threshold voltage-triggered hardware trojan in the modern FPGA fabric–detection methodology and mitigation

    Get PDF
    The ageing phenomenon of negative bias temperature instability (NBTI) continues to challenge the dynamic thermal management of modern FPGAs. Increased transistor density leads to thermal accumulation and propagates higher and non-uniform temperature variations across the FPGA. This aggravates the impact of NBTI on key PMOS transistor parameters such as threshold voltage and drain current. Where it ages the transistors, with a successive reduction in FPGA lifetime and reliability, it also challenges its security. The ingress of threshold voltage-triggered hardware Trojan, a stealthy and malicious electronic circuit, in the modern FPGA, is one such potential threat that could exploit NBTI and severely affect its performance. The development of an effective and efficient countermeasure against it is, therefore, highly critical. Accordingly, we present a comprehensive FPGA security scheme, comprising novel elements of hardware Trojan infection, detection, and mitigation, to protect FPGA applications against the hardware Trojan. Built around the threat model of a naval warship’s integrated self-protection system (ISPS), we propose a threshold voltage-triggered hardware Trojan that operates in a threshold voltage region of 0.45V to 0.998V, consuming ultra-low power (10.5nW), and remaining stealthy with an area overhead as low as 1.5% for a 28 nm technology node. The hardware Trojan detection sub-scheme provides a unique lightweight threshold voltage-aware sensor with a detection sensitivity of 0.251mV/nA. With fixed and dynamic ring oscillator-based sensor segments, the precise measurement of frequency and delay variations in response to shifts in the threshold voltage of a PMOS transistor is also proposed. Finally, the FPGA security scheme is reinforced with an online transistor dynamic scaling (OTDS) to mitigate the impact of hardware Trojan through run-time tolerant circuitry capable of identifying critical gates with worst-case drain current degradation

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Radiation Tolerant Electronics, Volume II

    Get PDF
    Research on radiation tolerant electronics has increased rapidly over the last few years, resulting in many interesting approaches to model radiation effects and design radiation hardened integrated circuits and embedded systems. This research is strongly driven by the growing need for radiation hardened electronics for space applications, high-energy physics experiments such as those on the large hadron collider at CERN, and many terrestrial nuclear applications, including nuclear energy and safety management. With the progressive scaling of integrated circuit technologies and the growing complexity of electronic systems, their ionizing radiation susceptibility has raised many exciting challenges, which are expected to drive research in the coming decade.After the success of the first Special Issue on Radiation Tolerant Electronics, the current Special Issue features thirteen articles highlighting recent breakthroughs in radiation tolerant integrated circuit design, fault tolerance in FPGAs, radiation effects in semiconductor materials and advanced IC technologies and modelling of radiation effects

    로직 및 피지컬 합성에서의 타이밍 분석과 최적화

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김태환.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of process–voltage–temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.타이밍 분석은 반도체 회로 개발 필수 과정 중 하나로, 최신 공정일수록 공정-전압-온도 변이 증가를 포함한 다양한 요인으로 하여금 그 중요성이 커지고 있다. 본 논문에서는 로직 및 피지컬 합성과 관련하여 세 가지 타이밍 분석 및 최적화 문제에 대해 다룬다. 첫째로, 오늘날 대부분의 정적 타이밍 분석은 모든 플립-플롭의 클럭-출력 딜레이가 고정된 값이라는 가정을 바탕으로 이루어졌다. 하지만 실제 클럭-출력 딜레이는 해당 플립-플롭의 셋업 및 홀드 스큐에 영향을 받는다. 본 논문에서는 이러한 특성을 수학적으로 정리하였으며, 이를 확장 가능한 속도 향상 기법과 더불어 주어진 회로의 타이밍 분석 및 클럭 스큐 스케쥴링 문제에 적용하였다. 둘째로, 유사 문턱 연산은 초고집적 회로 동작의 에너지 효율을 끌어 올릴 수 있다는 점에서 각광받지만, 큰 폭의 성능 변이 및 비선형성 때문에 널리 활용되고 있지 않다. 이를 해결하기 위해 유사 문턱 전압 영역 및 최신 공정 노드에서 보다 정확한 타이밍 예측을 위한 하드웨어 성능 모니터링 방법론 전반을 제안하였다. 마지막으로, 비동기 회로는 기존 동기 회로의 대안 중 하나로, 그 중에서도 비동기 파이프라인 회로는 비교적 적은 설계 노력만으로도 구현 가능하다는 장점이 있다. 본 논문에서는 2위상 묶음 데이터 프로토콜 기반 비동기 파이프라인 컨트롤러 상에서, 정확한 핸드셰이킹 통신을 위해 삽입된 딜레이 버퍼에 의한 면적 증가를 완화할 수 있는 합성 기법을 제시하였다.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto

    Design and Optimization for Resilient Energy Efficient Computing

    Get PDF
    Heutzutage sind moderne elektronische Systeme ein integraler Bestandteil unseres Alltags. Dies wurde unter anderem durch das exponentielle Wachstum der Integrationsdichte von integrierten Schaltkreisen ermöglicht zusammen mit einer Verbesserung der Energieeffizienz, welche in den letzten 50 Jahren stattfand, auch bekannt als Moore‘s Gesetz. In diesem Zusammenhang ist die Nachfrage von energieeffizienten digitalen Schaltkreisen enorm angestiegen, besonders in Anwendungsfeldern wie dem Internet of Things (IoT). Da der Leistungsverbrauch von Schaltkreisen stark mit der Versorgungsspannung verknüpft ist, wurden effiziente Verfahren entwickelt, welche die Versorgungsspannung in den nahen Schwellenspannung-Bereich skalieren, zusammengefasst unter dem Begriff Near-Threshold-Computing (NTC). Mithilfe dieser Verfahren kann eine Erhöhung der Energieeffizienz von Schaltungen um eine ganze Größenordnung ermöglicht werden. Neben der verbesserten Energiebilanz ergeben sich jedoch zahlreiche Herausforderungen was den Schaltungsentwurf angeht. Zum Beispiel führt das Reduzieren der Versorgungsspannung in den nahen Schwellenspannungsbereich zu einer verzehnfachten Erhöhung der Sensibilität der Schaltkreise gegenüber Prozessvariation, Spannungsfluktuationen und Temperaturveränderungen. Die Einflüsse dieser Variationen reduzieren die Zuverlässigkeit von NTC Schaltkreisen und sind ihr größtes Hindernis bezüglich einer umfassenden Nutzung. Traditionelle Ansätze und Methoden aus dem nominalen Spannungsbereich zur Kompensation von Variabilität können nicht effizient angewandt werden, da die starken Performance-Variationen und Sensitivitäten im nahen Schwellenspannungsbereich dessen Kapazitäten übersteigen. Aus diesem Grund sind neue Entwurfsparadigmen und Entwurfsautomatisierungskonzepte für die Anwendung von NTC erforderlich. Das Ziel dieser Arbeit ist die zuvor erwähnten Probleme durch die Bereitstellung von ganzheitlichen Methoden zum Design von NTC Schaltkreisen sowie dessen Entwurfsautomatisierung anzugehen, welche insbesondere auf der Schaltungs- sowie Logik-Ebene angewandt werden. Dabei werden tiefgehende Analysen der Zuverlässigkeit von NTC Systemen miteinbezogen und Optimierungsmethoden werden vorgeschlagen welche die Zuverlässigkeit, Performance und Energieeffizienz verbessern. Die Beiträge dieser Arbeit sind wie folgt: Schaltungssynthese und Timing Closure unter Einbezug von Variationen: Das Einhalten von Anforderungen an das zeitliche Verhalten und Zuverlässigkeit von NTC ist eine anspruchsvolle Aufgabe. Die Auswirkungen von Variabilität kommen bei starken Performance-Schwankungen, welche zu teuren zeitlichen Sicherheitsmargen führen, oder sich in Hold-Time Verstößen ausdrücken, verursacht durch funktionale Störungen, zum Vorschein. Die konventionellen Ansätze beschränken sich dabei alleine auf die Erhöhung von zeitlichen Sicherheitsmargen. Dies ist jedoch sehr ineffizient für NTC, wegen dem starken Ausmaß an Variationen und den erhöhten Leckströmen. In dieser Arbeit wird ein Konzept zur Synthese und Timing Closure von Schaltkreisen unter Variationen vorgestellt, welches sowohl die Sensitivität gegenüber Variationen reduziert als auch die Energieeffizienz, Performance und Zuverlässigkeit verbessert und zugleich den Mehraufwand von Timing Closures [1, 2] verringert. Simulationsergebnisse belegen, dass unser vorgeschlagener Ansatz die Verzögerungszeit um 87% reduziert und die Performance und Energieeffizienz um 25% beziehungsweise 7.4% verbessert, zu Kosten eines erhöhten Flächenbedarfs von 4.8%. Schichtübergreifende Zuverlässigkeits-, Energieeffizienz- und Performance-Optimierung von Datenpfaden: Schichtübergreifende Analyse von Prozessor-Datenpfaden, welche den ganzen Weg spannen vom Kompilierer zum Schaltungsentwurf, kann potenzielle Optimierungsansätze aufzeigen. Ein Datenpfad ist eine Kombination von mehreren funktionalen Einheiten, welche diverse Instruktionen verarbeiten können. Unsere Analyse zeigt, dass die Ausführungszeiten von Instruktionen bei niedrigen Versorgungsspannungen stark variieren, weshalb eine Klassifikation in schnelle und langsame Instruktionen vorgenommen werden kann. Des Weiteren können funktionale Instruktionen als häufig und selten genutzte Instruktionen kategorisiert werden. Diese Arbeit stellt eine Multi-Zyklen-Instruktionen-Methode vor, welche die Energieeffizienz und Belastbarkeit von funktionalen Einheiten erhöhen kann [3]. Zusätzlich stellen wir einen Partitionsalgorithmus vor, welcher ein fein-granulares Power-gating von selten genutzten Einheiten ermöglicht [4] durch Partition von einzelnen funktionalen Einheiten in mehrere kleinere Einheiten. Die vorgeschlagenen Methoden verbessern das zeitliche Schaltungsverhalten signifikant, und begrenzen zugleich die Leckströme beträchtlich, durch Einsatz einer Kombination von Schaltungs-Redesign- und Code-Replacement-Techniken. Simulationsresultate zeigen, dass die entwickelten Methoden die Performance und Energieeffizienz von arithmetisch-logischen Einheiten (ALU) um 19% beziehungsweise 43% verbessern. Des Weiteren kann der Zuwachs in Performance der optimierten Schaltungen in eine Verbesserung der Zuverlässigkeit umgewandelt werden [5, 6]. Post-Fabrication und Laufzeit-Tuning: Prozess- und Laufzeitvariationen haben einen starken Einfluss auf den Minimum Energy Point (MEP) von NTC-Schaltungen, welcher mit der energieeffizientesten Versorgungsspannung assoziiert ist. Es ist ein besonderes Anliegen, die NTC-Schaltung nach der Herstellung (post-fabrication) so zu kalibrieren, dass sich die Schaltung im MEP-Zustand befindet, um die beste Energieeffizient zu erreichen. In dieser Arbeit, werden Post-Fabrication und Laufzeit-Tuning vorgeschlagen, welche die Schaltung basierend auf Geschwindigkeits- und Leistungsverbrauch-Messungen nach der Herstellung auf den MEP kalibrieren. Die vorgestellten Techniken ermitteln den MEP per Chip-Basis um den Einfluss von Prozessvariationen mit einzubeziehen und dynamisch die Versorgungsspannung und Frequenz zu adaptieren um zeitabhängige Variationen wie Workload und Temperatur zu adressieren. Zu diesem Zweck wird in die Firmware eines Chips ein Regression-Modell integriert, welches den MEP basierend auf Workload- und Temperatur-Messungen zur Laufzeit extrahiert. Das Regressions-Modell ist für jeden Chip einzigartig und basiert lediglich auf Post-Fabrication-Messungen. Simulationsergebnisse zeigen das der entwickelte Ansatz eine sehr hohe prognostische Treffsicherheit und Energieeffizienz hat, ähnlich zu hardware-implementierten Methoden, jedoch ohne hardware-seitigen Mehraufwand [7, 8]. Selektierte Flip-Flop Optimierung: Ultra-Low-Voltage Schaltungen müssen im nominalen Versorgungsspannungs-Mode arbeiten um zeitliche Anforderungen von laufenden Anwendungen zu erfüllen. In diesem Fall ist die Schaltung von starken Alterungsprozessen betroffen, welche die Transistoren durch Erhöhung der Schwellenspannungen degradieren. Unsere tiefgehenden Analysen haben gezeigt das gewisse Flip-Flop-Architekturen von diesen Alterungserscheinungen beeinflusst werden indem fälschlicherweise konstante Werte ( \u270\u27 oder \u271\u27) für eine lange Zeit gespeichert sind. Im Vergleich zu anderen Komponenten sind Flip-Flops sensitiver zu Alterungsprozessen und versagen unter anderem dabei einen neuen Wert innerhalb des vorgegebenen zeitlichen Rahmens zu übernehmen. Außerdem kann auch ein geringfügiger Spannungsabfall zu diesen zeitlichen Verstößen führen, falls die betreffenden gealterten Flip-Flops zum kritischen Pfad zuzuordnen sind. In dieser Arbeit wird eine selektiver Flip-Flop-Optimierungsmethode vorgestellt, welche die Schaltungen bezüglich Robustheit gegen statische Alterung und Spannungsabfall optimieren. Dabei werden zuerst optimierte robuste Flip-Flops generiert und diese dann anschließend in die Standard-Zellen-Bibliotheken integriert. Flip-Flops, die in der Schaltung zum kritischen Pfad gehören und Alterung sowie Spannungsabfall erfahren, werden durch die optimierten robusten Versionen ersetzt, um das Zeitverhalten und die Zuverlässigkeit der Schaltung zu verbessern [9, 10]. Simulationsergebnisse zeigen, dass die erwartete Lebenszeit eines Prozessors um 37% verbessert werden kann, während Leckströme um nur 0.1% erhöht werden. Während NTC das Potenzial hat große Energieeffizienz zu ermöglichen, ist der Einsatz in neue Anwendungsfeldern wie IoT wegen den zuvor erwähnten Problemen bezüglich der hohen Sensitivität gegenüber Variationen und deshalb mangelnder Zuverlässigkeit, noch nicht durchsetzbar. In dieser Dissertation und in noch nicht publizierten Werken [11–17], stellen wir Lösungen zu diesen Problemen vor, die eine Integration von NTC in heutige Systeme ermöglichen
    corecore