153 research outputs found

    Inter-module Interfacing techniques for SoCs with multiple clock domains to address challenges in modern deep sub-micron technologies

    Get PDF
    Miniaturization of integrated circuits (ICs) due to the improvement in lithographic techniques in modem deep sub-micron (DSM) technologies allows several complex processing elements to coexist in one IC, which are called System-on-Chip. As a first contribution, this thesis quantitatively analyzes the severity of timing constraints associated with Clock Distribution Network (CDN) in modem DSM technologies and shows that different processing elements may work in different dock domains to alleviate these constraints. Such systems are known as Globally Asynchronous Locally Synchronous (GALS) systems. It is imperative that different processing elements of a GALS system need to communicate with each other through some interfacing technique, and these interfaces can be asynchronous or synchronous. Conventionally, the asynchronous interfaces are described at the Register Transfer Logic (RTL) or system level. Such designs are susceptible to certain design constraints that cannot be addressed at higher abstraction levels; crosstalk glitch is one such constraint. This thesis initially identifies, using an analytical model, the possibility of asynchronous interface malfunction due to crosstalk glitch propagation. Next, we characterize crosstalk glitch propagation under normal operating conditions for two different classes of asynchronous protocols, namely bundled data protocol based and delay insensitive asynchronous designs. Subsequently, we propose a logic abstraction level modeling technique, which provides a framework to the designer to verify the asynchronous protocols against crosstalk glitches. The utility of this modeling technique is demonstrated experimentally on a Xilinx Virtex-II Pro FPGA. Furthermore, a novel methodology is proposed to quench such crosstalk glitch propagation through gating the asynchronous interface from sending the signal during potential glitch vulnerable instances. This methodology is termed as crosstalk glitch gating. This technique is successfully applied to obtain crosstalk glitch quenching in the representative interfaces. This thesis also addresses the dock skew challenges faced by high-performance synchronous interfacing methodologies in modem DSM technologies. The proposed methodology allows communicating modules to run at a frequency that is independent of the dock skew. Leveraging a novel clock-scheduling algorithm, our technique permits a faster module to communicate safely with a slower module without slowing down. Safe data communications for mesochronous schemes and for the cases when communicating modules have dock frequency ratios of integer or coprime numbers are theoretically explained and experimentally demonstrated. A clock-scheduling technique to dynamically accommodate phase variations is also proposed. These methods are implemented to the Xilinx Virtex II Pro technology. Experiments prove that the proposed interfacing scheme allows modules to communicate data safely, for mesochronous schemes, at 350 MHz, which is the limit of the technology used, under a dock skew of more than twice the time period (i.e. a dock skew of 12 ns

    The UTMOST pulsar timing programme I: overview and first results

    Full text link
    We present an overview and the first results from a large-scale pulsar timing programme that is part of the UTMOST project at the refurbished Molonglo Observatory Synthesis Radio Telescope (MOST) near Canberra, Australia. We currently observe more than 400 mainly bright southern radio pulsars with up to daily cadences. For 205 (8 in binaries, 4 millisecond pulsars) we publish updated timing models, together with their flux densities, flux density variability, and pulse widths at 843 MHz, derived from observations spanning between 1.4 and 3 yr. In comparison with the ATNF pulsar catalogue, we improve the precision of the rotational and astrometric parameters for 123 pulsars, for 47 by at least an order of magnitude. The time spans between our measurements and those in the literature are up to 48 yr, which allows us to investigate their long-term spin-down history and to estimate proper motions for 60 pulsars, of which 24 are newly determined and most are major improvements. The results are consistent with interferometric measurements from the literature. A model with two Gaussian components centred at 139 and 463ย kmโ€…sโˆ’1463~\text{km} \: \text{s}^{-1} fits the transverse velocity distribution best. The pulse duty cycle distributions at 50 and 10 per cent maximum are best described by log-normal distributions with medians of 2.3 and 4.4 per cent, respectively. We discuss two pulsars that exhibit spin-down rate changes and drifting subpulses. Finally, we describe the autonomous observing system and the dynamic scheduler that has increased the observing efficiency by a factor of 2-3 in comparison with static scheduling.Comment: 31 pages, 14 figures, 6 tables, accepted for publication in MNRA

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoCโ€”its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Toward Reliable, Secure, and Energy-Efficient Multi-Core System Design

    Get PDF
    Computer hardware researchers have perennially focussed on improving the performance of computers while stipulating the energy consumption under a strict budget. While several innovations over the years have led to high performance and energy efficient computers, more challenges have also emerged as a fallout. For example, smaller transistor devices in modern multi-core systems are afflicted with several reliability and security concerns, which were inconceivable even a decade ago. Tackling these bottlenecks happens to negatively impact the power and performance of the computers. This dissertation explores novel techniques to gracefully solve some of the pressing challenges of the modern computer design. Specifically, the proposed techniques improve the reliability of on-chip communication fabric under a high power supply noise, increase the energy-efficiency of low-power graphics processing units, and demonstrate an unprecedented security loophole of the low-power computing paradigm through rigorous hardware-based experiments

    ํŽ„์Šค ๊ธฐ๋ฐ˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €๋ฅผ ๊ฐ–์ถ˜ ๊ณ ์šฉ๋Ÿ‰ DRAM์„ ์œ„ํ•œ ์ปจํŠธ๋กค๋Ÿฌ PHY ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€์ˆ˜ํ™˜.A controller PHY for managed DRAM solution, which is a new memory structure to maximize capacity while minimizing refresh power, is presented. Inter-symbol interference is critical in such a high-capacity DRAM interface in which many DRAM chips share a command/address (C/A) channel. A pulse-based feed-forward equalizer (PB-FFE) is introduced to reduce ISI on a C/A channel. The controller PHY supports all the training sequences specified in the DDR4 standard. A glitch-free DCDL is also adopted to perform link training efficiently and to reduce training time. The DQ transmitter adopts quarter-rate architecture to reduce output latency. For the quarter-rate transmitters in DQ, we propose a quadrature error corrector (QEC), in which clock signal phase errors are corrected using two replicas of the 4:1 serializer of the output stage. Pulse shrinking is used to compare and equalize the outputs of these two replica serializers. A controller PHY was fabricated in 55nm CMOS. The PB-FFE increases the timing margin from 0.23UI to 0.29UI at 1067Mbps. At 2133Mbps, the read timing and voltage margins are 0.53UI and 211mV after read training, and the write margins are 0.72UI and 230mV after write training. To validate the QEC effectiveness, a prototype quarter-rate transmitter, including the QEC, was fabricated to another chip in 65nm CMOS. Adopting our QEC, the experimental results show that the output phase errors of the transmitter are reduced to a residual error of 0.8ps, and the output eye width and height are improved by 84% and 61%, respectively, at a data-rate of 12.8Gbps.๋ณธ ์—ฐ๊ตฌ์—์„œ ์šฉ๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด์„œ๋„ ๋ฆฌํ”„๋ ˆ์‹œ ์ „๋ ฅ์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ์ธ ๊ด€๋ฆฌํ˜• DRAM ์†”๋ฃจ์…˜์„ ์œ„ํ•œ ์ปจํŠธ๋กค๋Ÿฌ PHY๋ฅผ ์ œ์‹œํ•˜์˜€๋‹ค. ์ด์™€ ๊ฐ™์€ ๊ณ ์šฉ๋Ÿ‰ DRAM ์ธํ„ฐํŽ˜์ด์Šค์—์„œ๋Š” ๋งŽ์€ DRAM ์นฉ์ด ๋ช…๋ น / ์ฃผ์†Œ (C/A) ์ฑ„๋„์„ ๊ณต์œ ํ•˜๊ณ  ์žˆ์–ด์„œ ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์ด ๋ฐœ์ƒํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ C/A ์ฑ„๋„์—์„œ์˜ ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํŽ„์Šค ๊ธฐ๋ฐ˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ € (PB-FFE)๋ฅผ ์ฑ„ํƒํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋ณธ ์—ฐ๊ตฌ์˜ ์ปจํŠธ๋กค๋Ÿฌ PHY๋Š” DDR4 ํ‘œ์ค€์— ์ง€์ •๋œ ๋ชจ๋“  ํŠธ๋ ˆ์ด๋‹ ์‹œํ€€์Šค๋ฅผ ์ง€์›ํ•œ๋‹ค. ๋งํฌ ํŠธ๋ ˆ์ด๋‹์„ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ํŠธ๋ ˆ์ด๋‹ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๊ธ€๋ฆฌ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ๋””์ง€ํ„ธ ์ œ์–ด ์ง€์—ฐ ๋ผ์ธ (DCDL)์„ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ปจํŠธ๋กค๋Ÿฌ PHY์˜ DQ ์†ก์‹ ๊ธฐ๋Š” ์ถœ๋ ฅ ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์†ก์‹ ๊ธฐ์˜ ๊ฒฝ์šฐ์—๋Š” ์ง๊ต ํด๋Ÿญ ๊ฐ„ ์œ„์ƒ ์˜ค๋ฅ˜๊ฐ€ ์ถœ๋ ฅ ์‹ ํ˜ธ์˜ ๋ฌด๊ฒฐ์„ฑ์— ์˜ํ–ฅ์„ ์ฃผ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ถœ๋ ฅ ๋‹จ์˜ 4 : 1 ์ง๋ ฌ ๋ณ€ํ™˜๊ธฐ์˜ ๋‘ ๋ณต์ œ๋ณธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋ก ์‹ ํ˜ธ ์œ„์ƒ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” QEC (Quadrature Error Corrector)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณต์ œ๋œ 2๊ฐœ์˜ ์ง๋ ฌ ๋ณ€ํ™˜๊ธฐ์˜ ์ถœ๋ ฅ์„ ๋น„๊ตํ•˜๊ณ  ๊ท ๋“ฑํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํŽ„์Šค ์ˆ˜์ถ• ์ง€์—ฐ ๋ผ์ธ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ์ปจํŠธ๋กค๋Ÿฌ PHY๋Š” 55nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์กฐ๋˜์—ˆ๋‹ค. PB-FFE๋Š” 1067Mbps์—์„œ C/A ์ฑ„๋„ ํƒ€์ด๋ฐ ๋งˆ์ง„์„ 0.23UI์—์„œ 0.29UI๋กœ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ฝ๊ธฐ ํŠธ๋ ˆ์ด๋‹ ํ›„ ์ฝ๊ธฐ ํƒ€์ด๋ฐ ๋ฐ ์ „์•• ๋งˆ์ง„์€ 2133Mbps์—์„œ 0.53UI ๋ฐ 211mV์ด๊ณ , ์“ฐ๊ธฐ ํŠธ๋ ˆ์ด๋‹ ํ›„ ์“ฐ๊ธฐ ๋งˆ์ง„์€ 0.72UI ๋ฐ 230mV์ด๋‹ค. QEC์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด QEC๋ฅผ ํฌํ•จํ•œ ํ”„๋กœํ†  ํƒ€์ž… ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์†ก์‹ ๊ธฐ๋ฅผ 65nm CMOS์˜ ๋‹ค๋ฅธ ์นฉ์œผ๋กœ ์ œ์ž‘ํ•˜์˜€๋‹ค. QEC๋ฅผ ์ ์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์†ก์‹ ๊ธฐ์˜ ์ถœ๋ ฅ ์œ„์ƒ ์˜ค๋ฅ˜๊ฐ€ 0.8ps์˜ ์ž”๋ฅ˜ ์˜ค๋ฅ˜๋กœ ๊ฐ์†Œํ•˜๊ณ , ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋ˆˆ์˜ ํญ๊ณผ ๋†’์ด๊ฐ€ 12.8Gbps์˜ ๋ฐ์ดํ„ฐ ์†๋„์—์„œ ๊ฐ๊ฐ 84 %์™€ 61 % ๊ฐœ์„ ๋˜์—ˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.1.1 HEAVY LOAD C/A CHANNEL 5 1.1.2 QUARTER-RATE ARCHITECTURE IN DQ TRANSMITTER 7 1.1.3 SUMMARY 8 1.2 THESIS ORGANIZATION 10 CHAPTER 2 ARCHITECTURE 11 2.1 MDS DIMM STRUCTURE 11 2.2 MDS CONTROLLER 15 2.3 MDS CONTROLLER PHY 17 2.3.1 INITIALIZATION SEQUENCE 20 2.3.2 LINK TRAINING FINITE-STATE MACHINE 23 2.3.3 POWER DOWN MODE 28 CHAPTER 3 PULSE-BASED FEED-FORWARD EQUALIZER 29 3.1 COMMAND/ADDRESS CHANNEL 29 3.2 COMMAND/ADDRESS TRANSMITTER 33 3.3 PULSE-BASED FEED-FORWARD EQUALIZER 35 CHAPTER 4 CIRCUIT IMPLEMENTATION 39 4.1 BUILDING BLOCKS 39 4.1.1 ALL-DIGITAL PHASE-LOCKED LOOP (ADPLL) 39 4.1.2 ALL-DIGITAL DELAY-LOCKED LOOP (ADDLL) 44 4.1.3 GLITCH-FREE DCDL CONTROL 47 4.1.4 DUTY-CYCLE CORRECTOR (DCC) 50 4.1.5 DQ/DQS TRANSMITTER 52 4.1.6 DQ/DQS RECEIVER 54 4.1.7 ZQ CALIBRATION 56 4.2 MODELING AND VERIFICATION OF LINK TRAINING 59 4.3 BUILT-IN SELF-TEST CIRCUITS 66 CHAPTER 5 QUADRATURE ERROR CORRECTOR USING REPLICA SERIALIZERS AND PULSE-SHRINKING DELAY LINES 69 5.1 PHASE CORRECTION USING REPLICA SERIALIZERS AND PULSE-SHRINKING UNITS 69 5.2 OVERALL QEC ARCHITECTURE AND ITS OPERATION 71 5.3 FINE DELAY UNIT IN THE PSDL 76 CHAPTER 6 EXPERIMENTAL RESULTS 78 6.1 CONTROLLER PHY 78 6.2 PROTOTYPE QEC 88 CHAPTER 7 CONCLUSION 94 BIBLIOGRAPHY 96Docto

    Wrapper design for multifrequency IP cores

    Full text link

    ๋กœ์ง ๋ฐ ํ”ผ์ง€์ปฌ ํ•ฉ์„ฑ์—์„œ์˜ ํƒ€์ด๋ฐ ๋ถ„์„๊ณผ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€ํƒœํ™˜.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of processโ€“voltageโ€“temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.ํƒ€์ด๋ฐ ๋ถ„์„์€ ๋ฐ˜๋„์ฒด ํšŒ๋กœ ๊ฐœ๋ฐœ ํ•„์ˆ˜ ๊ณผ์ • ์ค‘ ํ•˜๋‚˜๋กœ, ์ตœ์‹  ๊ณต์ •์ผ์ˆ˜๋ก ๊ณต์ •-์ „์••-์˜จ๋„ ๋ณ€์ด ์ฆ๊ฐ€๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์š”์ธ์œผ๋กœ ํ•˜์—ฌ๊ธˆ ๊ทธ ์ค‘์š”์„ฑ์ด ์ปค์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋กœ์ง ๋ฐ ํ”ผ์ง€์ปฌ ํ•ฉ์„ฑ๊ณผ ๊ด€๋ จํ•˜์—ฌ ์„ธ ๊ฐ€์ง€ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ์ตœ์ ํ™” ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ฒซ์งธ๋กœ, ์˜ค๋Š˜๋‚  ๋Œ€๋ถ€๋ถ„์˜ ์ •์  ํƒ€์ด๋ฐ ๋ถ„์„์€ ๋ชจ๋“  ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํด๋Ÿญ-์ถœ๋ ฅ ๋”œ๋ ˆ์ด๊ฐ€ ๊ณ ์ •๋œ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ •์„ ๋ฐ”ํƒ•์œผ๋กœ ์ด๋ฃจ์–ด์กŒ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ํด๋Ÿญ-์ถœ๋ ฅ ๋”œ๋ ˆ์ด๋Š” ํ•ด๋‹น ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ์…‹์—… ๋ฐ ํ™€๋“œ ์Šคํ์— ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ์ •๋ฆฌํ•˜์˜€์œผ๋ฉฐ, ์ด๋ฅผ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์†๋„ ํ–ฅ์ƒ ๊ธฐ๋ฒ•๊ณผ ๋”๋ถˆ์–ด ์ฃผ์–ด์ง„ ํšŒ๋กœ์˜ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ํด๋Ÿญ ์Šคํ ์Šค์ผ€์ฅด๋ง ๋ฌธ์ œ์— ์ ์šฉํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์œ ์‚ฌ ๋ฌธํ„ฑ ์—ฐ์‚ฐ์€ ์ดˆ๊ณ ์ง‘์  ํšŒ๋กœ ๋™์ž‘์˜ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๋Œ์–ด ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ๊ฐ๊ด‘๋ฐ›์ง€๋งŒ, ํฐ ํญ์˜ ์„ฑ๋Šฅ ๋ณ€์ด ๋ฐ ๋น„์„ ํ˜•์„ฑ ๋•Œ๋ฌธ์— ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์‚ฌ ๋ฌธํ„ฑ ์ „์•• ์˜์—ญ ๋ฐ ์ตœ์‹  ๊ณต์ • ๋…ธ๋“œ์—์„œ ๋ณด๋‹ค ์ •ํ™•ํ•œ ํƒ€์ด๋ฐ ์˜ˆ์ธก์„ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐฉ๋ฒ•๋ก  ์ „๋ฐ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋น„๋™๊ธฐ ํšŒ๋กœ๋Š” ๊ธฐ์กด ๋™๊ธฐ ํšŒ๋กœ์˜ ๋Œ€์•ˆ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ทธ ์ค‘์—์„œ๋„ ๋น„๋™๊ธฐ ํŒŒ์ดํ”„๋ผ์ธ ํšŒ๋กœ๋Š” ๋น„๊ต์  ์ ์€ ์„ค๊ณ„ ๋…ธ๋ ฅ๋งŒ์œผ๋กœ๋„ ๊ตฌํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 2์œ„์ƒ ๋ฌถ์Œ ๋ฐ์ดํ„ฐ ํ”„๋กœํ† ์ฝœ ๊ธฐ๋ฐ˜ ๋น„๋™๊ธฐ ํŒŒ์ดํ”„๋ผ์ธ ์ปจํŠธ๋กค๋Ÿฌ ์ƒ์—์„œ, ์ •ํ™•ํ•œ ํ•ธ๋“œ์…ฐ์ดํ‚น ํ†ต์‹ ์„ ์œ„ํ•ด ์‚ฝ์ž…๋œ ๋”œ๋ ˆ์ด ๋ฒ„ํผ์— ์˜ํ•œ ๋ฉด์  ์ฆ๊ฐ€๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ํ•ฉ์„ฑ ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto

    Multi-level simulation of nano-electronic digital circuits on GPUs

    Get PDF
    Simulation of circuits and faults is an essential part in design and test validation tasks of contemporary nano-electronic digital integrated CMOS circuits. Shrinking technology processes with smaller feature sizes and strict performance and reliability requirements demand not only detailed validation of the functional properties of a design, but also accurate validation of non-functional aspects including the timing behavior. However, due to the rising complexity of the circuit behavior and the steady growth of the designs with respect to the transistor count, timing-accurate simulation of current designs requires a lot of computational effort which can only be handled by proper abstraction and a high degree of parallelization. This work presents a simulation model for scalable and accurate timing simulation of digital circuits on data-parallel graphics processing unit (GPU) accelerators. By providing compact modeling and data-structures as well as through exploiting multiple dimensions of parallelism, the simulation model enables not only fast and timing-accurate simulation at logic level, but also massively-parallel simulation with switch level accuracy. The model facilitates extensions for fast and efficient fault simulation of small delay faults at logic level, as well as first-order parametric and parasitic faults at switch level. With the parallelization on GPUs, detailed and scalable simulation is enabled that is applicable even to multi-million gate designs. This way, comprehensive analyses of realistic timing-related faults in presence of process- and parameter variations are enabled for the first time. Additional simulation efficiency is achieved by merging the presented methods in a unified simulation model, that allows to combine the unique advantages of the different levels of abstraction in a mixed-abstraction multi-level simulation flow to reach even higher speedups. Experimental results show that the implemented parallel approach achieves unprecedented simulation throughput as well as high speedup compared to conventional timing simulators. The underlying model scales for multi-million gate designs and gives detailed insights into the timing behavior of digital CMOS circuits, thereby enabling large-scale applications to aid even highly complex design and test validation tasks

    Energy-Efficient Digital Circuit Design using Threshold Logic Gates

    Get PDF
    abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201
    • โ€ฆ
    corecore