13 research outputs found

    EffiTest: Efficient Delay Test and Statistical Prediction for Configuring Post-silicon Tunable Buffers

    Full text link
    At nanometer manufacturing technology nodes, process variations significantly affect circuit performance. To combat them, post- silicon clock tuning buffers can be deployed to balance timing bud- gets of critical paths for each individual chip after manufacturing. The challenge of this method is that path delays should be mea- sured for each chip to configure the tuning buffers properly. Current methods for this delay measurement rely on path-wise frequency stepping. This strategy, however, requires too much time from ex- pensive testers. In this paper, we propose an efficient delay test framework (EffiTest) to solve the post-silicon testing problem by aligning path delays using the already-existing tuning buffers in the circuit. In addition, we only test representative paths and the delays of other paths are estimated by statistical delay prediction. Exper- imental results demonstrate that the proposed method can reduce the number of frequency stepping iterations by more than 94% with only a slight yield loss.Comment: ACM/IEEE Design Automation Conference (DAC), June 201

    Variation-Aware Variable Latency Design

    Full text link

    Modeling the Impact of Process Variation on Resistive Bridge Defects

    No full text
    Recent research has shown that tests generated without taking process variation into account may lead to loss of test quality. At present there is no efficient device-level modeling technique that models the effect of process variation on resistive bridges. This paper presents a fast and accurate technique to model the effect of process variation on resistive bridge defects. The proposed model is implemented in two stages: firstly, it employs an accurate transistor model (BSIM4) to calculate the critical resistance of a bridge; secondly, the effect of process variation is incorporated in this model by using three transistor parameters: gate length (L), threshold voltage (V) and effective mobility (ueff) where each follow Gaussian distribution. Experiments are conducted on a 65-nm gate library (for illustration purposes), and results show that on average the proposed modeling technique is more than 7 times faster and in the worst case, error in bridge critical resistance is 0.8% when compared with HSPICE

    로직 및 피지컬 합성에서의 타이밍 분석과 최적화

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김태환.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of process–voltage–temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.타이밍 분석은 반도체 회로 개발 필수 과정 중 하나로, 최신 공정일수록 공정-전압-온도 변이 증가를 포함한 다양한 요인으로 하여금 그 중요성이 커지고 있다. 본 논문에서는 로직 및 피지컬 합성과 관련하여 세 가지 타이밍 분석 및 최적화 문제에 대해 다룬다. 첫째로, 오늘날 대부분의 정적 타이밍 분석은 모든 플립-플롭의 클럭-출력 딜레이가 고정된 값이라는 가정을 바탕으로 이루어졌다. 하지만 실제 클럭-출력 딜레이는 해당 플립-플롭의 셋업 및 홀드 스큐에 영향을 받는다. 본 논문에서는 이러한 특성을 수학적으로 정리하였으며, 이를 확장 가능한 속도 향상 기법과 더불어 주어진 회로의 타이밍 분석 및 클럭 스큐 스케쥴링 문제에 적용하였다. 둘째로, 유사 문턱 연산은 초고집적 회로 동작의 에너지 효율을 끌어 올릴 수 있다는 점에서 각광받지만, 큰 폭의 성능 변이 및 비선형성 때문에 널리 활용되고 있지 않다. 이를 해결하기 위해 유사 문턱 전압 영역 및 최신 공정 노드에서 보다 정확한 타이밍 예측을 위한 하드웨어 성능 모니터링 방법론 전반을 제안하였다. 마지막으로, 비동기 회로는 기존 동기 회로의 대안 중 하나로, 그 중에서도 비동기 파이프라인 회로는 비교적 적은 설계 노력만으로도 구현 가능하다는 장점이 있다. 본 논문에서는 2위상 묶음 데이터 프로토콜 기반 비동기 파이프라인 컨트롤러 상에서, 정확한 핸드셰이킹 통신을 위해 삽입된 딜레이 버퍼에 의한 면적 증가를 완화할 수 있는 합성 기법을 제시하였다.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto

    A Fast and Accurate Process Variation-Aware Modeling Technique for Resistive Bridge Defects

    Full text link

    Sincronização em sistemas integrados a alta velocidade

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo skew) e temporal (baixo jitter ), em sistemas sí ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensões dos dispositivos e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transições do sinal de relógio tem sido cada vez mais afectada por varia ções de processo, tensão e temperatura. Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sí ncrono. Na prossecu ção deste objectivo principal, esta tese propõe quatro novos modelos de incerteza com âmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parâmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitações temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite estimar a incerteza em repetidores com liga ções RC e assim otimizar o dimensionamento da rede de distribui ção de relógio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruí do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom ínios de sincronismo. Este modelo pode ser facilmente incorporado numa ferramenta autom atica para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerância do sistema ao ru ído de alimentação. Finalmente, usando os modelos propostos, são discutidas as tendências da precisão de rel ogio. Conclui-se que os limites da precisão do rel ogio são, em ultima an alise, impostos por fontes de varia ção dinâmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo, esta tese defende a procura de solu ções em outros ní veis de abstração, que não apenas o ní vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sí ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically everywhere (low jitter) in high-performance Integrated Circuits (ICs) has become an increasingly di cult and time-consuming task, due to technology scaling. As transistor dimensions shrink and more functionality is packed into an IC, clock precision becomes increasingly a ected by Process, Voltage and Temperature (PVT) variations. This thesis addresses the problem of clock uncertainty in high-performance ICs, in order to determine the limits of the synchronous design paradigm. In pursuit of this main goal, this thesis proposes four new uncertainty models, with di erent underlying principles and scopes. The rst model targets uncertainty in static CMOS inverters. The main advantage of this model is that it depends only on parameters that can easily be obtained. Thus, it can provide information on upcoming constraints very early in the design stage. The second model addresses uncertainty in repeaters with RC interconnects, allowing the designer to optimise the repeater's size and spacing, for a given uncertainty budget, with low computational e ort. The third model, can be used to predict jitter accumulation in cascaded repeaters, like clock trees or delay lines. Because it takes into consideration correlations among variability sources, it can also be useful to promote oorplan-based power and clock distribution design in order to minimise jitter accumulation. A fourth model is proposed to analyse uncertainty in systems with multiple synchronous domains. It can be easily incorporated in an automatic tool to determine the best topology for a given application or to evaluate the system's tolerance to power-supply noise. Finally, using the proposed models, this thesis discusses clock precision trends. Results show that limits in clock precision are ultimately imposed by dynamic uncertainty, which is expected to continue increasing with technology scaling. Therefore, it advocates the search for solutions at other abstraction levels, and not only at the physical level, that may increase system performance with a smaller impact on the assumptions behind the synchronous design paradigm

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    Learning Approaches to Analog and Mixed Signal Verification and Analysis

    Get PDF
    The increased integration and interaction of analog and digital components within a system has amplified the need for a fast, automated, combined analog, and digital verification methodology. There are many automated characterization, test, and verification methods used in practice for digital circuits, but analog and mixed signal circuits suffer from long simulation times brought on by transistor-level analysis. Due to the substantial amount of simulations required to properly characterize and verify an analog circuit, many undetected issues manifest themselves in the manufactured chips. Creating behavioral models, a circuit abstraction of analog components assists in reducing simulation time which allows for faster exploration of the design space. Traditionally, creating behavioral models for non-linear circuits is a manual process which relies heavily on design knowledge for proper parameter extraction and circuit abstraction. Manual modeling requires a high level of circuit knowledge and often fails to capture critical effects stemming from block interactions and second order device effects. For this reason, it is of interest to extract the models directly from the SPICE level descriptions so that these effects and interactions can be properly captured. As the devices are scaled, process variations have a more profound effect on the circuit behaviors and performances. Creating behavior models from the SPICE level descriptions, which include input parameters and a large process variation space, is a non-trivial task. In this dissertation, we focus on addressing various problems related to the design automation of analog and mixed signal circuits. Analog circuits are typically highly specialized and fined tuned to fit the desired specifications for any given system reducing the reusability of circuits from design to design. This hinders the advancement of automating various aspects of analog design, test, and layout. At the core of many automation techniques, simulations, or data collection are required. Unfortunately, for some complex analog circuits, a single simulation may take many days. This prohibits performing any type of behavior characterization or verification of the circuit. This leads us to the first fundamental problem with the automation of analog devices. How can we reduce the simulation cost while maintaining the robustness of transistor level simulations? As analog circuits can vary vastly from one design to the next and are hardly ever comprised of standard library based building blocks, the second fundamental question is how to create automated processes that are general enough to be applied to all or most circuit types? Finally, what circuit characteristics can we utilize to enhance the automation procedures? The objective of this dissertation is to explore these questions and provide suitable evidence that they can be answered. We begin by exploring machine learning techniques to model the design space using minimal simulation effort. Circuit partitioning is employed to reduce the complexity of the machine learning algorithms. Using the same partitioning algorithm we further explore the behavior characterization of analog circuits undergoing process variation. The circuit partitioning is general enough to be used by any CMOS based analog circuit. The ideas and learning gained from behavioral modeling during behavior characterization are used to improve the simulation through event propagation, input space search, complexity and information measurements. The reduction of the input space and behavioral modeling of low complexity, low information primitive elements reduces the simulation time of large analog and mixed signal circuits by 50-75%. The method is extended and applied to assist in analyzing analog circuit layout. All of the proposed methods are implemented on analog circuits ranging from small benchmark circuits to large, highly complex and specialized circuits. The proposed dependency based partitioning of large analog circuits in the time domain allows for fast identification of highly sensitive transistors as well as provides a natural division of circuit components. Modeling analog circuits in the time domain with this partitioning technique and SVM learning algorithms allows for very fast transient behavior predictions, three orders of magnitude faster than traditional simulators, while maintaining 95% accuracy. Analog verification can be explored through a reduction of simulation time by utilizing the partitions, information and complexity measures, and input space reduction. Behavioral models are created using supervised learning techniques for detected primitive elements. We will show the effectiveness of the method on four analog circuits where the simulation time is decreased by 55-75%. Utilizing the reduced simulation method, critical nodes can be found quickly and efficiently. The nodes found using this method match those found by an experienced layout engineer, but are detected automatically given the design and input specifications. The technique is further extended to find the tolerance of transistors to both process variation and power supply fluctuation. This information allows for corrections in layout overdesign or guidance in placing noise reducing components such as guard rings or decoupling capacitors. The proposed approaches significantly reduce the simulation time required to perform the tasks traditionally, maintain high accuracy, and can be automated

    Methods for Robust and Energy-Efficient Microprocessor Architectures

    Get PDF
    Σήμερα, η εξέλιξη της τεχνολογίας επιτρέπει τη βελτίωση τριών βασικών στοιχείων της σχεδίασης των επεξεργαστών: αυξημένες επιδόσεις, χαμηλότερη κατανάλωση ισχύος και χαμηλότερο κόστος παραγωγής του τσιπ, ενώ οι σχεδιαστές επεξεργαστών έχουν επικεντρωθεί στην παραγωγή επεξεργαστών με περισσότερες λειτουργίες σε χαμηλότερο κόστος. Οι σημερινοί επεξεργαστές είναι πολύ ταχύτεροι και διαθέτουν εξελιγμένες λειτουργικές μονάδες συγκριτικά με τους προκατόχους τους, ωστόσο, καταναλώνουν αρκετά μεγάλη ενέργεια. Τα ποσά ηλεκτρικής ισχύος που καταναλώνονται, και η επακόλουθη έκλυση θερμότητας, αυξάνονται παρά τη μείωση του μεγέθους των τρανζίστορ. Αναπτύσσοντας όλο και πιο εξελιγμένους μηχανισμούς και λειτουργικές μονάδες για την αύξηση της απόδοσης και βελτίωση της ενέργειας, σε συνδυασμό με τη μείωση του μεγέθους των τρανζίστορ, οι επεξεργαστές έχουν γίνει εξαιρετικά πολύπλοκα συστήματα, καθιστώντας τη διαδικασία της επικύρωσής τους σημαντική πρόκληση για τη βιομηχανία ολοκληρωμένων κυκλωμάτων. Συνεπώς, οι κατασκευαστές επεξεργαστών αφιερώνουν επιπλέον χρόνο, προϋπολογισμό και χώρο στο τσιπ για να διασφαλίσουν ότι οι επεξεργαστές θα λειτουργούν σωστά κατά τη διάθεσή τους στη αγορά. Για τους λόγους αυτούς, η εργασία αυτή παρουσιάζει νέες μεθόδους για την επιτάχυνση και τη βελτίωση της φάσης της επικύρωσης, καθώς και για τη βελτίωση της ενεργειακής απόδοσης των σύγχρονων επεξεργαστών. Στο πρώτο μέρος της διατριβής προτείνονται δύο διαφορετικές μέθοδοι για την επικύρωση του επεξεργαστή, οι οποίες συμβάλλουν στην επιτάχυνση αυτής της διαδικασίας και στην αποκάλυψη σπάνιων σφαλμάτων στους μηχανισμούς μετάφρασης διευθύνσεων των σύγχρονων επεξεργαστών. Και οι δύο μέθοδοι καθιστούν ευκολότερη την ανίχνευση και τη διάγνωση σφαλμάτων, και επιταχύνουν την ανίχνευση του σφάλματος κατά τη φάση της επικύρωσης. Στο δεύτερο μέρος της διατριβής παρουσιάζεται μια λεπτομερής μελέτη χαρακτηρισμού των περιθωρίων τάσης σε επίπεδο συστήματος σε δύο σύγχρονους ARMv8 επεξεργαστές. Η μελέτη του χαρακτηρισμού προσδιορίζει τα αυξημένα περιθώρια τάσης που έχουν προκαθοριστεί κατά τη διάρκεια κατασκευής του κάθε μεμονωμένου πυρήνα του επεξεργαστή και αναλύει τυχόν απρόβλεπτες συμπεριφορές που μπορεί να προκύψουν σε συνθήκες μειωμένης τάσης. Για την μελέτη και καταγραφή της συμπεριφοράς του συστήματος υπό συνθήκες μειωμένης τάσης, παρουσιάζεται επίσης σε αυτή τη διατριβή μια απλή και ενοποιημένη συνάρτηση: η συνάρτηση πυκνότητας-σοβαρότητας. Στη συνέχεια, παρουσιάζεται αναλυτικά η ανάπτυξη ειδικά σχεδιασμένων προγραμμάτων (micro-viruses) τα οποία υποβάλουν της θεμελιώδεις δομές του επεξεργαστή σε μεγάλο φορτίο εργασίας. Αυτά τα προγράμματα στοχεύουν στην γρήγορη αναγνώριση των ασφαλών περιθωρίων τάσης. Τέλος, πραγματοποιείται ο χαρακτηρισμός των περιθωρίων τάσης σε εκτελέσεις πολλαπλών πυρήνων, καθώς επίσης και σε διαφορετικές συχνότητες, και προτείνεται ένα πρόγραμμα το οποίο εκμεταλλεύεται όλες τις διαφορετικές πτυχές του προβλήματος της κατανάλωσης ενέργειας και παρέχει μεγάλη εξοικονόμηση ενέργειας διατηρώντας παράλληλα υψηλά επίπεδα απόδοσης. Αυτή η μελέτη έχει ως στόχο τον εντοπισμό και την ανάλυση της σχέσης μεταξύ ενέργειας και απόδοσης σε διαφορετικούς συνδυασμούς τάσης και συχνότητας, καθώς και σε διαφορετικό αριθμό νημάτων/διεργασιών που εκτελούνται στο σύστημα, αλλά και κατανομής των προγραμμάτων στους διαθέσιμους πυρήνες.Technology scaling has enabled improvements in the three major design optimization objectives: increased performance, lower power consumption, and lower die cost, while system design has focused on bringing more functionality into products at lower cost. While today's microprocessors, are much faster and much more versatile than their predecessors, they also consume much power. As operating frequency and integration density increase, the total chip power dissipation increases. This is evident from the fact that due to the demand for increased functionality on a single chip, more and more transistors are being packed on a single die and hence, the switching frequency increases in every technology generation. However, by developing aggressive and sophisticated mechanisms to boost performance and to enhance the energy efficiency in conjunction with the decrease of the size of transistors, microprocessors have become extremely complex systems, making the microprocessor verification and manufacturing testing a major challenge for the semiconductor industry. Manufacturers, therefore, choose to spend extra effort, time, budget and chip area to ensure that the delivered products are operating correctly. To meet high-dependability requirements, manufacturers apply a sequence of verification tasks throughout the entire life-cycle of the microprocessor to ensure the correct functionality of the microprocessor chips from the various types of errors that may occur after the products are released to the market. To this end, this work presents novel methods for ensuring the correctness of the microprocessor during the post-silicon validation phase and for improving the energy efficiency requirements of modern microprocessors. These methods can be applied during the prototyping phase of the microprocessors or after their release to the market. More specifically, in the first part of the thesis, we present and describe two different ISA-independent software-based post-silicon validation methods, which contribute to formalization and modeling as well as the acceleration of the post-silicon validation process and expose difficult-to-find bugs in the address translation mechanisms (ATM) of modern microprocessors. Both methods improve the detection and diagnosis of a hardware design bug in the ATM structures and significantly accelerate the bug detection during the post-silicon validation phase. In the second part of the thesis we present a detailed system-level voltage scaling characterization study for two state-of-the-art ARMv8-based multicore CPUs. We present an extensive characterization study which identifies the pessimistic voltage guardbands (the increased voltage margins set by the manufacturer) of each individual microprocessor core and analyze any abnormal behavior that may occur in off-nominal voltage conditions. Towards the formalization of the any abnormal behavior we also present a simple consolidated function; the Severity function, which aggregates the effects of reduced voltage operation. We then introduce the development of dedicated programs (diagnostic micro-viruses) that aim to accelerate the time-consuming voltage margins characterization studies by stressing the fundamental hardware components. Finally, we present a comprehensive exploration of how two server-grade systems behave in different frequency and core allocation configurations beyond nominal voltage operation in multicore executions. This analysis aims (1) to identify the best performance per watt operation points, (2) to reveal how and why the different core allocation options affect the energy consumption, and (3) to enhance the default Linux scheduler to take task allocation decisions for balanced performance and energy efficiency
    corecore