1,017 research outputs found

    Robust Design of Variation-Sensitive Digital Circuits

    Get PDF
    The nano-age has already begun, where typical feature dimensions are smaller than 100nm. The operating frequency is expected to increase up to 12 GHz, and a single chip will contain over 12 billion transistors in 2020, as given by the International Technology Roadmap for Semiconductors (ITRS) initiative. ITRS also predicts that the scaling of CMOS devices and process technology, as it is known today, will become much more difficult as the industry advances towards the 16nm technology node and further. This aggressive scaling of CMOS technology has pushed the devices to their physical limits. Design goals are governed by several factors other than power, performance and area such as process variations, radiation induced soft errors, and aging degradation mechanisms. These new design challenges have a strong impact on the parametric yield of nanometer digital circuits and also result in functional yield losses in variation-sensitive digital circuits such as Static Random Access Memory (SRAM) and flip-flops. Moreover, sub-threshold SRAM and flip-flops circuits, which are aggravated by the strong demand for lower power consumption, show larger sensitivity to these challenges which reduces their robustness and yield. Accordingly, it is not surprising that the ITRS considers variability and reliability as the most challenging obstacles for nanometer digital circuits robust design. Soft errors are considered one of the main reliability and robustness concerns in SRAM arrays in sub-100nm technologies due to low operating voltage, small node capacitance, and high packing density. The SRAM arrays soft errors immunity is also affected by process variations. We develop statistical design-oriented soft errors immunity variations models for super-threshold and sub-threshold SRAM cells accounting for die-to-die variations and within-die variations. This work provides new design insights and highlights the important design knobs that can be used to reduce the SRAM cells soft errors immunity variations. The developed models are scalable, bias dependent, and only require the knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit optimization as well as technology prediction. The derived models are verified using Monte Carlo SPICE simulations, referring to an industrial hardware-calibrated 65nm CMOS technology. The demand for higher performance leads to very deep pipelining which means that hundreds of thousands of flip-flops are required to control the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop can result in latching incorrect data causing the overall system to malfunction. In addition, the flip-flops power dissipation represents a considerable fraction of the total power dissipation. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of secondary importance. Accordingly, statistical gate sizing is conducted to different flip-flops topologies for timing yield improvement of super-threshold flip-flops and power yield improvement of sub-threshold flip-flops. Following that, a comparative analysis between these flip-flops topologies considering the required overhead for yield improvement is performed. This comparative analysis provides useful recommendations that help flip-flops designers on selecting the best flip-flops topology that satisfies their system specifications while taking the process variations impact and robustness requirements into account. Adaptive Body Bias (ABB) allows the tuning of the transistor threshold voltage, Vt, by controlling the transistor body voltage. A forward body bias reduces Vt, increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias increases Vt, reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and less leaky devices or slowing down devices that are fast and highly leaky. Practically, the implementation of the ABB is desirable to bias each device in a design independently, to mitigate within-die variations. However, supplying so many separate voltages inside a die results in a large area overhead. On the other hand, using the same body bias for all devices on the same die limits its capability to compensate for within-die variations. Thus, the granularity level of the ABB scheme is a trade-off between the within-die variations compensation capability and the associated area overhead. This work introduces new ABB circuits that exhibit lower area overhead by a factor of 143X than that of previous ABB circuits. In addition, these ABB circuits are resolution free since no digital-to-analog converters or analog-to-digital converters are required on their implementations. These ABB circuits are adopted to high performance critical paths, emulating a real microprocessor architecture, for process variations compensation and also adopted to SRAM arrays, for Negative Bias Temperature Instability (NBTI) aging and process variations compensation. The effectiveness of the new ABB circuits is verified by post layout simulation results and test chip measurements using triple-well 65nm CMOS technology. The highly capacitive nodes of wide fan-in dynamic circuits and SRAM bitlines limit the performance of these circuits. In addition, process variations mitigation by statistical gate sizing increases this capacitance further and fails in achieving the target yield improvement. We propose new negative capacitance circuits that reduce the overall parasitic capacitance of these highly capacitive nodes. These negative capacitance circuits are adopted to wide fan-in dynamic circuits for timing yield improvement up to 99.87% and to SRAM arrays for read access yield improvement up to 100%. The area and power overheads of these new negative capacitance circuits are amortized over the large die area of the microprocessor and the SRAM array. The effectiveness of the new negative capacitance circuits is verified by post layout simulation results and test chip measurements using 65nm CMOS technology

    Robust low-power digital circuit design in nano-CMOS technologies

    Get PDF
    Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

    Design of variation-tolerant synchronizers for multiple clock and voltage domains

    Get PDF
    PhD ThesisParametric variability increasingly affects the performance of electronic circuits as the fabrication technology has reached the level of 32nm and beyond. These parameters may include transistor Process parameters (such as threshold voltage), supply Voltage and Temperature (PVT), all of which could have a significant impact on the speed and power consumption of the circuit, particularly if the variations exceed the design margins. As systems are designed with more asynchronous protocols, there is a need for highly robust synchronizers and arbiters. These components are often used as interfaces between communication links of different timing domains as well as sampling devices for asynchronous inputs coming from external components. These applications have created a need for new robust designs of synchronizers and arbiters that can tolerate process, voltage and temperature variations. The aim of this study was to investigate how synchronizers and arbiters should be designed to tolerate parametric variations. All investigations focused mainly on circuit-level and transistor level designs and were modeled and simulated in the UMC90nm CMOS technology process. Analog simulations were used to measure timing parameters and power consumption along with a “Monte Carlo” statistical analysis to account for process variations. Two main components of synchronizers and arbiters were primarily investigated: flip-flop and mutual-exclusion element (MUTEX). Both components can violate the input timing conditions, setup and hold window times, which could cause metastability inside their bistable elements and possibly end in failures. The mean-time between failures is an important reliability feature of any synchronizer delay through the synchronizer. The MUTEX study focused on the classical circuit, in addition to a number of tolerance, based on increasing internal gain by adding current sources, reducing the capacitive loading, boosting the transconductance of the latch, compensating the existing Miller capacitance, and adding asymmetry to maneuver the metastable point. The results showed that some circuits had little or almost no improvements, while five techniques showed significant improvements by reducing τ and maintaining high tolerance. Three design approaches are proposed to provide variation-tolerant synchronizers. wagging synchronizer proposed to First, the is significantly increase reliability over that of the conventional two flip-flop synchronizer. The robustness of the wagging technique can be enhanced by using robust τ latches or adding one more cycle of synchronization. The second approach is the Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly detecting a metastable event and correcting it by enforcing the previously stored logic value. This technique significantly reduces the resolution time down from uncertain synchronization technique is proposed to transfer signals between Multiple- Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional level-shifters between the domains or multiple power supplies within each domain. This interface circuit uses a synchronous set and feedback reset protocol which provides level-shifting and synchronization of all signals between the domains, from a wide range of voltage-supplies and clock frequencies. Overall, synchronizer circuits can tolerate variations to a greater extent by employing the wagging technique or using a MADAC latch, while MUTEX tolerance can suffice with small circuit modifications. Communication between MVD/MCD can be achieved by an asynchronous handshake without a need for adding level-shifters.The Saudi Arabian Embassy in London, Umm Al-Qura University, Saudi Arabi

    Within-Die Delay Variation Measurement And Analysis For Emerging Technologies Using An Embedded Test Structure

    Get PDF
    Both random and systematic within-die process variations (PV) are growing more severe with shrinking geometries and increasing die size. Escalation in the variations in delay and power with reductions in feature size places higher demands on the accuracy of variation models. Their availability can be used to improve yield, and the corresponding profitability and product quality of the fabricated integrated circuits (ICs). Sources of within-die variations include optical source limitations, and layout-based systematic effects (pitch, line-width variability, and microscopic etch loading). Unfortunately, accurate models of within-die PVs are becoming more difficult to derive because of their increasingly sensitivity to design-context. Embedded test structures (ETS) continue to play an important role in the development of models of PVs and as a mechanism to improve correlations between hardware and models. Variations in path delays are increasing with scaling, and are increasingly affected by neighborhood\u27 interactions. In order to fully characterize within-die variations, delays must be measured in the context of actual core-logic macros. Doing so requires the use of an embedded test structure, as opposed to traditional scribe line test structures such as ring oscillators (RO). Accurate measurements of within-die variations can be used, e.g., to better tune models to actual hardware (model-to-hardware correlations). In this research project, I propose an embedded test structure called REBEL (Regional dELay BEhavior) that is designed to measure path delays in a minimally invasive fashion; and its architecture measures the path delays more accurately. Design for manufacture-ability (DFM) analysis is done on the on 90 nm ASIC chips and 28nm Zynq 7000 series FPGA boards. I present ASIC results on within-die path delay variations in a floating-point unit (FPU) fabricated in IBM\u27s 90 nm technology, with 5 pipeline stages, used as a test vehicle in chip experiments carried out at nine different temperature/voltage (TV) corners. Also experimental data has been analyzed for path delay variations in short vs long paths. FPGA results on within-die variation and die-to-die variations on Advanced Encryption System (AES) using single pipelined stage are also presented. Other analysis that have been performed on the calibrated path delays are Flip Flop propagation delays for both rising and falling edge (tpHL and tpLH), uncertainty analysis, path distribution analysis, short versus long path variations and mid-length path within-die variation. I also analyze the impact on delay when the chips are subjected to industrial-level temperature and voltage variations. From the experimental results, it has been established that the proposed REBEL provides capabilities similar to an off-chip logic analyzer, i.e., it is able to capture the temporal behavior of the signal over time, including any static and dynamic hazards that may occur on the tested path. The ASIC results further show that path delays are correlated to the launch-capture (LC) interval used to time them. Therefore, calibration as proposed in this work must be carried out in order to obtain an accurate analysis of within-die variations. Results on ASIC chips show that short paths can vary up to 35% on average, while long paths vary up to 20% at nominal temperature and voltage. A similar trend occurs for within-die variations of mid-length paths where magnitudes reduced to 20% and 5%, respectively. The magnitude of delay variations in both these analyses increase as temperature and voltage are changed to increase performance. The high level of within-die delay variations are undesirable from a design perspective, but they represent a rich source of entropy for applications that make use of \u27secrets\u27 such as authentication, hardware metering and encryption. Physical unclonable functions (PUFs) are a class of primitives that leverage within-die-variations as a means of generating random bit strings for these types of applications, including hardware security and trust. Zynq FPGAs Die-to-Die and within-die variation study shows that on average there is 5% of within-Die variation and the range of die-to-Die variation can go upto 3ns. The die-to-Die variations can be explored in much further detail to study the variations spatial dependance. Additionally, I also carried out research in the area data mining to cater for big data by focusing the work on decision tree classification (DTC) to speed-up the classification step in hardware implementation. For this purpose, I devised a pipelined architecture for the implementation of axis parallel binary decision tree classification for meeting up with the requirements of execution time and minimal resource usage in terms of area. The motivation for this work is that analyzing larger data-sets have created abundant opportunities for algorithmic and architectural developments, and data-mining innovations, thus creating a great demand for faster execution of these algorithms, leading towards improving execution time and resource utilization. Decision trees (DT) have since been implemented in software programs. Though, the software implementation of DTC is highly accurate, the execution times and the resource utilization still require improvement to meet the computational demands in the ever growing industry. On the other hand, hardware implementation of DT has not been thoroughly investigated or reported in detail. Therefore, I propose a hardware acceleration of pipelined architecture that incorporates the parallel approach in acquiring the data by having parallel engines working on different partitions of data independently. Also, each engine is processing the data in a pipelined fashion to utilize the resources more efficiently and reduce the time for processing all the data records/tuples. Experimental results show that our proposed hardware acceleration of classification algorithms has increased throughput, by reducing the number of clock cycles required to process the data and generate the results, and it requires minimal resources hence it is area efficient. This architecture also enables algorithms to scale with increasingly large and complex data sets. We developed the DTC algorithm in detail and explored techniques for adapting it to a hardware implementation successfully. This system is 3.5 times faster than the existing hardware implementation of classification.\u2

    Design and Optimization for Resilient Energy Efficient Computing

    Get PDF
    Heutzutage sind moderne elektronische Systeme ein integraler Bestandteil unseres Alltags. Dies wurde unter anderem durch das exponentielle Wachstum der Integrationsdichte von integrierten Schaltkreisen ermöglicht zusammen mit einer Verbesserung der Energieeffizienz, welche in den letzten 50 Jahren stattfand, auch bekannt als Moore‘s Gesetz. In diesem Zusammenhang ist die Nachfrage von energieeffizienten digitalen Schaltkreisen enorm angestiegen, besonders in Anwendungsfeldern wie dem Internet of Things (IoT). Da der Leistungsverbrauch von Schaltkreisen stark mit der Versorgungsspannung verknüpft ist, wurden effiziente Verfahren entwickelt, welche die Versorgungsspannung in den nahen Schwellenspannung-Bereich skalieren, zusammengefasst unter dem Begriff Near-Threshold-Computing (NTC). Mithilfe dieser Verfahren kann eine Erhöhung der Energieeffizienz von Schaltungen um eine ganze Größenordnung ermöglicht werden. Neben der verbesserten Energiebilanz ergeben sich jedoch zahlreiche Herausforderungen was den Schaltungsentwurf angeht. Zum Beispiel führt das Reduzieren der Versorgungsspannung in den nahen Schwellenspannungsbereich zu einer verzehnfachten Erhöhung der Sensibilität der Schaltkreise gegenüber Prozessvariation, Spannungsfluktuationen und Temperaturveränderungen. Die Einflüsse dieser Variationen reduzieren die Zuverlässigkeit von NTC Schaltkreisen und sind ihr größtes Hindernis bezüglich einer umfassenden Nutzung. Traditionelle Ansätze und Methoden aus dem nominalen Spannungsbereich zur Kompensation von Variabilität können nicht effizient angewandt werden, da die starken Performance-Variationen und Sensitivitäten im nahen Schwellenspannungsbereich dessen Kapazitäten übersteigen. Aus diesem Grund sind neue Entwurfsparadigmen und Entwurfsautomatisierungskonzepte für die Anwendung von NTC erforderlich. Das Ziel dieser Arbeit ist die zuvor erwähnten Probleme durch die Bereitstellung von ganzheitlichen Methoden zum Design von NTC Schaltkreisen sowie dessen Entwurfsautomatisierung anzugehen, welche insbesondere auf der Schaltungs- sowie Logik-Ebene angewandt werden. Dabei werden tiefgehende Analysen der Zuverlässigkeit von NTC Systemen miteinbezogen und Optimierungsmethoden werden vorgeschlagen welche die Zuverlässigkeit, Performance und Energieeffizienz verbessern. Die Beiträge dieser Arbeit sind wie folgt: Schaltungssynthese und Timing Closure unter Einbezug von Variationen: Das Einhalten von Anforderungen an das zeitliche Verhalten und Zuverlässigkeit von NTC ist eine anspruchsvolle Aufgabe. Die Auswirkungen von Variabilität kommen bei starken Performance-Schwankungen, welche zu teuren zeitlichen Sicherheitsmargen führen, oder sich in Hold-Time Verstößen ausdrücken, verursacht durch funktionale Störungen, zum Vorschein. Die konventionellen Ansätze beschränken sich dabei alleine auf die Erhöhung von zeitlichen Sicherheitsmargen. Dies ist jedoch sehr ineffizient für NTC, wegen dem starken Ausmaß an Variationen und den erhöhten Leckströmen. In dieser Arbeit wird ein Konzept zur Synthese und Timing Closure von Schaltkreisen unter Variationen vorgestellt, welches sowohl die Sensitivität gegenüber Variationen reduziert als auch die Energieeffizienz, Performance und Zuverlässigkeit verbessert und zugleich den Mehraufwand von Timing Closures [1, 2] verringert. Simulationsergebnisse belegen, dass unser vorgeschlagener Ansatz die Verzögerungszeit um 87% reduziert und die Performance und Energieeffizienz um 25% beziehungsweise 7.4% verbessert, zu Kosten eines erhöhten Flächenbedarfs von 4.8%. Schichtübergreifende Zuverlässigkeits-, Energieeffizienz- und Performance-Optimierung von Datenpfaden: Schichtübergreifende Analyse von Prozessor-Datenpfaden, welche den ganzen Weg spannen vom Kompilierer zum Schaltungsentwurf, kann potenzielle Optimierungsansätze aufzeigen. Ein Datenpfad ist eine Kombination von mehreren funktionalen Einheiten, welche diverse Instruktionen verarbeiten können. Unsere Analyse zeigt, dass die Ausführungszeiten von Instruktionen bei niedrigen Versorgungsspannungen stark variieren, weshalb eine Klassifikation in schnelle und langsame Instruktionen vorgenommen werden kann. Des Weiteren können funktionale Instruktionen als häufig und selten genutzte Instruktionen kategorisiert werden. Diese Arbeit stellt eine Multi-Zyklen-Instruktionen-Methode vor, welche die Energieeffizienz und Belastbarkeit von funktionalen Einheiten erhöhen kann [3]. Zusätzlich stellen wir einen Partitionsalgorithmus vor, welcher ein fein-granulares Power-gating von selten genutzten Einheiten ermöglicht [4] durch Partition von einzelnen funktionalen Einheiten in mehrere kleinere Einheiten. Die vorgeschlagenen Methoden verbessern das zeitliche Schaltungsverhalten signifikant, und begrenzen zugleich die Leckströme beträchtlich, durch Einsatz einer Kombination von Schaltungs-Redesign- und Code-Replacement-Techniken. Simulationsresultate zeigen, dass die entwickelten Methoden die Performance und Energieeffizienz von arithmetisch-logischen Einheiten (ALU) um 19% beziehungsweise 43% verbessern. Des Weiteren kann der Zuwachs in Performance der optimierten Schaltungen in eine Verbesserung der Zuverlässigkeit umgewandelt werden [5, 6]. Post-Fabrication und Laufzeit-Tuning: Prozess- und Laufzeitvariationen haben einen starken Einfluss auf den Minimum Energy Point (MEP) von NTC-Schaltungen, welcher mit der energieeffizientesten Versorgungsspannung assoziiert ist. Es ist ein besonderes Anliegen, die NTC-Schaltung nach der Herstellung (post-fabrication) so zu kalibrieren, dass sich die Schaltung im MEP-Zustand befindet, um die beste Energieeffizient zu erreichen. In dieser Arbeit, werden Post-Fabrication und Laufzeit-Tuning vorgeschlagen, welche die Schaltung basierend auf Geschwindigkeits- und Leistungsverbrauch-Messungen nach der Herstellung auf den MEP kalibrieren. Die vorgestellten Techniken ermitteln den MEP per Chip-Basis um den Einfluss von Prozessvariationen mit einzubeziehen und dynamisch die Versorgungsspannung und Frequenz zu adaptieren um zeitabhängige Variationen wie Workload und Temperatur zu adressieren. Zu diesem Zweck wird in die Firmware eines Chips ein Regression-Modell integriert, welches den MEP basierend auf Workload- und Temperatur-Messungen zur Laufzeit extrahiert. Das Regressions-Modell ist für jeden Chip einzigartig und basiert lediglich auf Post-Fabrication-Messungen. Simulationsergebnisse zeigen das der entwickelte Ansatz eine sehr hohe prognostische Treffsicherheit und Energieeffizienz hat, ähnlich zu hardware-implementierten Methoden, jedoch ohne hardware-seitigen Mehraufwand [7, 8]. Selektierte Flip-Flop Optimierung: Ultra-Low-Voltage Schaltungen müssen im nominalen Versorgungsspannungs-Mode arbeiten um zeitliche Anforderungen von laufenden Anwendungen zu erfüllen. In diesem Fall ist die Schaltung von starken Alterungsprozessen betroffen, welche die Transistoren durch Erhöhung der Schwellenspannungen degradieren. Unsere tiefgehenden Analysen haben gezeigt das gewisse Flip-Flop-Architekturen von diesen Alterungserscheinungen beeinflusst werden indem fälschlicherweise konstante Werte ( \u270\u27 oder \u271\u27) für eine lange Zeit gespeichert sind. Im Vergleich zu anderen Komponenten sind Flip-Flops sensitiver zu Alterungsprozessen und versagen unter anderem dabei einen neuen Wert innerhalb des vorgegebenen zeitlichen Rahmens zu übernehmen. Außerdem kann auch ein geringfügiger Spannungsabfall zu diesen zeitlichen Verstößen führen, falls die betreffenden gealterten Flip-Flops zum kritischen Pfad zuzuordnen sind. In dieser Arbeit wird eine selektiver Flip-Flop-Optimierungsmethode vorgestellt, welche die Schaltungen bezüglich Robustheit gegen statische Alterung und Spannungsabfall optimieren. Dabei werden zuerst optimierte robuste Flip-Flops generiert und diese dann anschließend in die Standard-Zellen-Bibliotheken integriert. Flip-Flops, die in der Schaltung zum kritischen Pfad gehören und Alterung sowie Spannungsabfall erfahren, werden durch die optimierten robusten Versionen ersetzt, um das Zeitverhalten und die Zuverlässigkeit der Schaltung zu verbessern [9, 10]. Simulationsergebnisse zeigen, dass die erwartete Lebenszeit eines Prozessors um 37% verbessert werden kann, während Leckströme um nur 0.1% erhöht werden. Während NTC das Potenzial hat große Energieeffizienz zu ermöglichen, ist der Einsatz in neue Anwendungsfeldern wie IoT wegen den zuvor erwähnten Problemen bezüglich der hohen Sensitivität gegenüber Variationen und deshalb mangelnder Zuverlässigkeit, noch nicht durchsetzbar. In dieser Dissertation und in noch nicht publizierten Werken [11–17], stellen wir Lösungen zu diesen Problemen vor, die eine Integration von NTC in heutige Systeme ermöglichen

    A survey of scan-capture power reduction techniques

    Get PDF
    With the advent of sub-nanometer geometries, integrated circuits (ICs) are required to be checked for newer defects. While scan-based architectures help detect these defects using newer fault models, test data inflation happens, increasing test time and test cost. An automatic test pattern generator (ATPG) exercise’s multiple fault sites simultaneously to reduce test data which causes elevated switching activity during the capture cycle. The switching activity results in an IR drop exceeding the devices under test (DUT) specification. An increase in IR-drop leads to failure of the patterns and may cause good DUTs to fail the test. The problem is severe during at-speed scan testing, which uses a functional rated clock with a high frequency for the capture operation. Researchers have proposed several techniques to reduce capture power. They used various methods, including the reduction of switching activity. This paper reviews the recently proposed techniques. The principle, algorithm, and architecture used in them are discussed, along with key advantages and limitations. In addition, it provides a classification of the techniques based on the method used and its application. The goal is to present a survey of the techniques and prepare a platform for future development in capture power reduction during scan testing

    로직 및 피지컬 합성에서의 타이밍 분석과 최적화

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김태환.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of process–voltage–temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.타이밍 분석은 반도체 회로 개발 필수 과정 중 하나로, 최신 공정일수록 공정-전압-온도 변이 증가를 포함한 다양한 요인으로 하여금 그 중요성이 커지고 있다. 본 논문에서는 로직 및 피지컬 합성과 관련하여 세 가지 타이밍 분석 및 최적화 문제에 대해 다룬다. 첫째로, 오늘날 대부분의 정적 타이밍 분석은 모든 플립-플롭의 클럭-출력 딜레이가 고정된 값이라는 가정을 바탕으로 이루어졌다. 하지만 실제 클럭-출력 딜레이는 해당 플립-플롭의 셋업 및 홀드 스큐에 영향을 받는다. 본 논문에서는 이러한 특성을 수학적으로 정리하였으며, 이를 확장 가능한 속도 향상 기법과 더불어 주어진 회로의 타이밍 분석 및 클럭 스큐 스케쥴링 문제에 적용하였다. 둘째로, 유사 문턱 연산은 초고집적 회로 동작의 에너지 효율을 끌어 올릴 수 있다는 점에서 각광받지만, 큰 폭의 성능 변이 및 비선형성 때문에 널리 활용되고 있지 않다. 이를 해결하기 위해 유사 문턱 전압 영역 및 최신 공정 노드에서 보다 정확한 타이밍 예측을 위한 하드웨어 성능 모니터링 방법론 전반을 제안하였다. 마지막으로, 비동기 회로는 기존 동기 회로의 대안 중 하나로, 그 중에서도 비동기 파이프라인 회로는 비교적 적은 설계 노력만으로도 구현 가능하다는 장점이 있다. 본 논문에서는 2위상 묶음 데이터 프로토콜 기반 비동기 파이프라인 컨트롤러 상에서, 정확한 핸드셰이킹 통신을 위해 삽입된 딜레이 버퍼에 의한 면적 증가를 완화할 수 있는 합성 기법을 제시하였다.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    High Performance Digital Circuit Techniques

    Get PDF
    Achieving high performance is one of the most difficult challenges in designing digital circuits. Flip-flops and adders are key blocks in most digital systems and must therefore be designed to yield highest performance. In this thesis, a new high performance serial adder is developed while power consumption is attained. Also, a statistical framework for the design of flip-flops is introduced that ensures that such sequential circuits meet timing yield under performance criteria. Firstly, a high performance serial adder is developed. The new adder is based on the idea of having a constant delay for the addition of two operands. While conventional adders exhibit logarithmic delay, the proposed adder works at a constant delay order. In addition, the new adder's hardware complexity is in a linear order with the word length, which consequently exhibits less area and power consumption as compared to conventional high performance adders. The thesis demonstrates the underlying algorithm used for the new adder and followed by simulation results. Secondly, this thesis presents a statistical framework for the design of flip-flops under process variations in order to maximize their timing yield. In nanometer CMOS technologies, process variations significantly impact the timing performance of sequential circuits which may eventually cause their malfunction. Therefore, developing a framework for designing such circuits is inevitable. Our framework generates the values of the nominal design parameters; i.e., the size of gates and transmission gates of flip-flop such that maximum timing yield is achieved for flip-flops. While previous works focused on improving the yield of flip-flops, less research was done to improve the timing yield in the presence of process variations

    Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

    Get PDF
    Rapid developments of the AI domain has revolutionized the computing industry by the introduction of state-of-art AI architectures. This growth is also accompanied by a massive increase in the power consumption. Near-Theshold Computing (NTC) has emerged as a viable solution by offering significant savings in power consumption paving the way for an energy efficient design paradigm. However, these benefits are accompanied by a deterioration in performance due to the severe process variation and slower transistor switching at Near-Threshold operation. These problems severely restrict the usage of Near-Threshold operation in commercial applications. In this work, a novel AI architecture, Tensor Processing Unit, operating at NTC is thoroughly investigated to tackle the issues hindering system performance. Research problems are demonstrated in a scientific manner and unique opportunities are explored to propose novel design methodologies
    corecore