25 research outputs found

    Physical design of USB1.1

    Get PDF
    In earlier days, interfacing peripheral devices to host computer has a big problematic. There existed so many different kindsโ€™ ports like serial port, parallel port, PS/2 etc. And their use restricts many situations, Such as no hot-pluggability and involuntary configuration. There are very less number of methods to connect the peripheral devices to host computer. The main reason that Universal Serial Bus was implemented to provide an additional benefits compared to earlier interfacing ports. USB is designed to allow many peripheral be connecting using single standardize interface. It provides an expandable fast, cost effective, hot-pluggable plug and play serial hardware interface that makes life of computer user easier allowing them to plug different devices to into USB port and have them configured automatically. In this thesis demonstrated the USB v1.1 architecture part in briefly and generated gate level net list form RTL code by applying the different constraints like timing, area and power. By applying the various types design constraints so that the performance was improved by 30%. And then it implemented in physically by using SoC encounter EDI system, estimation of chip size, power analysis and routing the clock signal to all flip-flops presented in the design. To reduce the clock switching power implemented register clustering algorithm (DBSCAN). In this design implementation TSMC 180nm technology library is used

    Power and area efficient clock stretching and critical path reshaping for error resilience

    Get PDF
    Process, voltage and temperature variations are on the rise with technology scaling. Nano-scale technology requires huge design margins to ensure reliable operation. Worst case design margining consumes significant amount of circuits and systems resources. In-situ error detection or correction is an alternative method for cost effective variation tolerance. However, existing in-situ error detection and correction circuits are power and area hungry since they use speculative error management, which gives less power savings at higher error rates. This paper proposes an error resilience technique utilizing available slack in the design. The proposed method uses a clock stretching circuit to relax timing margins on selected critical paths that has sufficient consecutive stage slack. We also propose a power optimization method which reshapes the critical path logic proportionate to the consecutive stage slack. Experimental results show that the proposed method achieves the power and area savings of 40% and 8% respectively compared to the worst case design approach. When compared to the TIMBER error resilience approach, the proposed method saves power more than 74% and area more than 13% at design time. Document type: Articl

    Benchmark methodologies for the optimized physical synthesis of RISC-V microprocessors

    Get PDF
    As technology continues to advance and chip sizes shrink, the complexity and design time required for integrated circuits have significantly increased. To address these challenges, Electronic Design Automation (EDA) tools have been introduced to streamline the design flow. These tools offer various methodologies and options to optimize power, performance, and chip area. However, selecting the most suitable methods from these options can be challenging, as they may lead to trade-offs among power, performance, and area. While architectural and Register Transfer Level (RTL) optimizations have been extensively studied in existing literature, the impact of optimization methods available in EDA tools on performance has not been thoroughly researched. This thesis aims to optimize a semiconductor processor through EDA tools within the physical synthesis domain to achieve increased performance while maintaining a balance between power efficiency and area utilization. By leveraging floorplanning tools and carefully selecting technology libraries and optimization options, the CV32E40P open-source processor is subjected to various floorplans to analyze their impact on chip performance. The employed techniques, including multibit components prefer option, multiplexer tree prefer option, identification and exclusion of problematic cells, and placement blockages, lead to significant improvements in cell density, congestion mitigation, and timing. The optimized synthesis results demonstrate a 71\% enhancement in chip design performance without a substantial increase in area, showcasing the effectiveness of these techniques in improving large-scale integrated circuits' performance, efficiency, and manufacturability. By exploring and implementing the available options in EDA tools, this study demonstrates how the processor's performance can be significantly improved while maintaining a balanced and efficient chip design. The findings contribute valuable insights to the field of electronic design automation, offering guidance to designers in selecting suitable methodologies for optimizing processors and other integrated circuits

    Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

    Get PDF
    Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

    Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption and Power/Ground Noise of Integrated Circuits and Systems

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ๊น€ํƒœํ™˜.For very-large-scale integration (VLSI) circuits, the activation of all flip-flops that are used to store data is synchronized by clock signals delivered through clock networks. Due to very high frequency of clock signal switches, the dynamic power consumed on clock networks takes a considerable portion of the total power consumption of the circuits. In addition, the largest amount of power consumption in the clock networks comes from the flip-flops and the buffers that drive the flip-flops at the clock network boundary. In addition, the requirement of simultaneously activating all flip-flops for synchronous circuits induces a high peak power/ground noise (i.e., voltage drop) at the clock boundary. In this regards, this thesis addresses two new problems: the problem of reducing the clock power consumption at the clock network boundary, and the problem of reducing the peak current at the clock network boundary. Unlike the prior works which have considered the optimization of flip-flops and clock buffers separately, our approach takes into account the co-optimization of flip-flops and clock buffers. Precisely, we propose four different types of hardware component that can implement a set of flip-flops and their driving buffer as a single unit. The key idea for the derivation of the four types of clock boundary component is that one of the inverters in the driving buffer and one of the inverters in each flip-flop can be combined and removed without changing the functionality of the flip-flops. Consequently, we have a more freedom to select (i.e., allocate) clock boundary components that is able to reduce the power consumption or peak current under timing constraint. We have implemented our approach of clock boundary optimization under bounded clock skew constraint and tested it with ISCAS 89 benchmark circuits. The experimental results confirm that our approach is able to reduce the clock power consumption by 7.9โˆผ10.2% and power/ground noise by 27.7%โˆผ30.9% on average.Chapter 1 Introduction 1 1.1 Clock Signal 1 1.2 Metrics of Clock Design 2 1.3 Clock Network Topologies 4 1.4 Multibit Flip-flop 5 1.5 Simultaneous Switching Noise 6 1.6 Contributions of This Dissertation 6 Chapter 2 Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption 8 2.1 Introduction 8 2.2 Types of Boundary Optimization 9 2.3 Analysis of Four Types of Flip-flop 12 2.3.1 Internal Power Comparison 12 2.3.2 Characterization of Power Consumption 14 2.4 Problem Formulation 15 2.5 The Proposed Algorithm 17 2.5.1 Independence Assumption 17 2.5.2 BoundaryMin Algorithm 17 2.6 Experimental Results 29 2.6.1 Experimental Setup 29 2.6.2 Clock Tree Boundary Optimization Results 33 2.6.3 Capacitance Analysis on Flip-flops 38 2.6.4 Slew and Skew Analysis 39 2.6.5 Window Width Analysis 39 2.7 Conclusions 41 Chapter 3 Clock Tree and Flip-flop Co-optimization for Reducing Power/Ground Noise 42 3.1 Introduction 42 3.2 Current Characteristic of Four Types of Flip-flop 45 3.3 Motivational Example 47 3.4 Problem Formulation 52 3.5 Proposed Algorithm 54 3.5.1 An Overview 54 3.5.2 Superposition of Current Flows 55 3.5.3 Formulation to Instance of MOSP Problem 57 3.5.4 Selecting Target Power Grid Points 59 3.5.5 Consideration of Reducing Power Consumption 62 3.6 Experimental Results 62 3.7 Summary 65 Chapter 4 Conclusion 68 4.1 Clock Buffer and Flip-flop Co-optimization for Reducing Power Consumption 68 4.2 Clock Buffer and Flip-flop Co-optimization for Reducing Power/Ground Noise 69 ์ดˆ๋ก 78Docto

    ๋น„์šฉ ํšจ์œจ์ ์ธ ํด๋Ÿญ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŒ… ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๊น€ํƒœํ™˜.์ €์ „๋ ฅ ์„ค๊ณ„๋Š” ์ตœ์‹  ์‹œ์Šคํ…œ-์˜จ-์นฉ (SoCs) ์„ค๊ณ„์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ๋ฐ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ €์ „๋ ฅ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋น„์šฉ ํšจ์œจ์ ์ธ ์ €์ „๋ ฅ ์„ค๊ณ„๋ฅผ ์œ„ํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ์„  ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ• ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ๋” ๋งŽ์€ ํ”Œ๋ฆฝ-ํ”Œ๋ž์— ๋Œ€ํ•ด ์ ์šฉํ• ์ˆ˜๋ก ํด๋Ÿญ ๊ฒŒ์ดํŒ…์— ํ•„์š”ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์— ํ•„์š”ํ•œ ํšŒ๋กœ ์ž์›์„ ๋ถ„์„ํ•˜์—ฌ ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ๋น„ํšจ์œจ์„ฑ์„ ๋ณด์ด๊ณ , ๊ธฐ์กด ๋ฐฉ๋ฒ•์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ฒ€์ถœ์— ํ•„์ˆ˜์ ์ด์ง€๋งŒ ๊ณ ๋น„์šฉ์˜ XOR ๊ฒŒ์ดํŠธ๋ฅผ ์™„๋ฒฝํžˆ ์ œ๊ฑฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ XOR ๊ฒŒ์ดํŠธ๊ฐ€ ํ•„์š” ์—†๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ํ•ด๋‹น ํšŒ๋กœ๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ ํšŒ๋กœ์˜ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ํ”„๋กœํŒŒ์ผ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆ๋œ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•์„ ๊ธฐ์กด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•๊ณผ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์ด ์ „๋ ฅ ์†Œ๋น„ ์ ˆ๊ฐ ๊ธฐํšŒ๋ฅผ ๋†“์น˜๋Š” ๋ฐ˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ํƒ€์ด๋ฐ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ์— ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์˜ ์ƒํƒœ ๋ณด์กด์šฉ ์ €์žฅ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์ด๋ž€ ์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ๋ฌด๋ถ„๋ณ„ํ•œ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ๋กœ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์—์„œ๋Š” ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์ˆ˜์ ์ด์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์—†๋Š” ๋Œ€์ƒ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด ๊ธฐ์ˆ  ์–ธ์–ด(HDL)๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๋Š” ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ ์€ ์–‘์ด ์•„๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๋Œ€ 2 ๋น„ํŠธ์˜ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์„ ์‚ฌ์šฉํ•˜์—ฌ ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ํšจ์œจ์ ์œผ๋กœ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์ด ํฌํ•จ๋œ ๋‘ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์˜ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” 2๋‹จ ์ƒํƒœ ๋ณด์กด ์ œ์–ด ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ์ฃผ์–ด์ง„ ํšŒ๋กœ์—์„œ ์ถฉ๋Œ์—†์ด ๋™์‹œ์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์„ ์ตœ๋Œ€๋กœ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ๋…๋ฆฝ ์ง‘ํ•ฉ ๋ฌธ์ œ(independent set problem)๊ธฐ๋ฐ˜์˜ ์—ฐ์‚ฐ๋ฒ•๋„ ์ œ์•ˆํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์— ํ•„์š”ํ•œ ์ €์žฅ ๊ณต๊ฐ„๊ณผ ํŒŒ์›Œ๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š”๋ฐ ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    ์ •์  ๋žจ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์— ๋Œ€ํ•œ ์ „์•• ๋ฐ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฌธ์ œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ๊น€ํƒœํ™˜.์นฉ์˜ ์ €์ „๋ ฅ ๋™์ž‘์€ ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋ฉฐ, ๊ณต์ •์ด ๋ฐœ์ „ํ•˜๋ฉด์„œ ๊ทธ ์ค‘์š”์„ฑ์€ ์ ์  ์ปค์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์นฉ์„ ๊ตฌ์„ฑํ•˜๋Š” ์ •์  ๋žจ(SRAM) ๋ฐ ๋กœ์ง(logic) ๊ฐ๊ฐ์— ๋Œ€ํ•ด์„œ ์ €์ „๋ ฅ์œผ๋กœ ๋™์ž‘์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๋…ผํ•œ๋‹ค. ์šฐ์„ , ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์นฉ์„ ๋ฌธํ„ฑ ์ „์•• ๊ทผ์ฒ˜์˜ ์ „์••(NTV)์—์„œ ๋™์ž‘์‹œํ‚ค๊ณ ์ž ํ•  ๋•Œ ๋ชจ๋‹ˆํ„ฐ๋ง ํšŒ๋กœ์˜ ์ธก์ •์„ ํ†ตํ•ด ์นฉ ๋‚ด์˜ ๋ชจ๋“  SRAM ๋ธ”๋ก์—์„œ ๋™์ž‘ ์‹คํŒจ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์นฉ์„ NTV ์˜์—ญ์—์„œ ๋™์ž‘์‹œํ‚ค๋Š” ๊ฒƒ์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ์ฆ๋Œ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด์ง€๋งŒ SRAM์˜ ๊ฒฝ์šฐ ๋™์ž‘ ์‹คํŒจ ๋•Œ๋ฌธ์— ๋™์ž‘ ์ „์••์„ ๋‚ฎ์ถ”๊ธฐ ์–ด๋ ต๋‹ค. ํ•˜์ง€๋งŒ ์นฉ๋งˆ๋‹ค ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๊ณต์ • ๋ณ€์ด๊ฐ€ ๋‹ค๋ฅด๋ฏ€๋กœ ์ตœ์†Œ ๋™์ž‘ ์ „์••์€ ์นฉ๋งˆ๋‹ค ๋‹ค๋ฅด๋ฉฐ, ๋ชจ๋‹ˆํ„ฐ๋ง์„ ํ†ตํ•ด ์ด๋ฅผ ์ถ”๋ก ํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์นฉ๋ณ„๋กœ SRAM์— ์„œ๋กœ ๋‹ค๋ฅธ ์ „์••์„ ์ธ๊ฐ€ํ•ด ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ •์„ ํ†ตํ•ด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค: (1) ๋””์ž์ธ ์ธํ”„๋ผ ์„ค๊ณ„ ๋‹จ๊ณ„์—์„œ๋Š” SRAM์˜ ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ์ถ”๋ก ํ•˜๊ณ  ์นฉ ์ƒ์‚ฐ ๋‹จ๊ณ„์—์„œ๋Š” SRAM ๋ชจ๋‹ˆํ„ฐ์˜ ์ธก์ •์„ ํ†ตํ•ด ์ „์••์„ ์ธ๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค; (2) ์นฉ์˜ SRAM ๋น„ํŠธ์…€(bitcell)๊ณผ ์ฃผ๋ณ€ ํšŒ๋กœ๋ฅผ ํฌํ•จํ•œ SRAM ๋ธ”๋ก๋“ค์˜ ๊ณต์ • ๋ณ€์ด๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ์ˆ˜ ์žˆ๋Š” SRAM ๋ชจ๋‹ˆํ„ฐ์™€ SRAM ๋ชจ๋‹ˆํ„ฐ์—์„œ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ๋Œ€์ƒ์„ ์ •์˜ํ•œ๋‹ค; (3) SRAM ๋ชจ๋‹ˆํ„ฐ์˜ ์ธก์ •๊ฐ’์„ ์ด์šฉํ•ด ๊ฐ™์€ ์นฉ์— ์กด์žฌํ•˜๋Š” ๋ชจ๋“  SRAM ๋ธ”๋ก์—์„œ ๋ชฉํ‘œ ์‹ ๋ขฐ์ˆ˜์ค€ ๋‚ด์—์„œ ์ฝ๊ธฐ, ์“ฐ๊ธฐ, ๋ฐ ์ ‘๊ทผ ๋™์ž‘ ์‹คํŒจ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ์ถ”๋ก ํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์„ ๋”ฐ๋ผ ์นฉ๋ณ„๋กœ SRAM ๋ธ”๋ก๋“ค์˜ ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ๋‹ค๋ฅด๊ฒŒ ์ธ๊ฐ€ํ•  ๊ฒฝ์šฐ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋Œ€๋กœ ๋ชจ๋“  ์นฉ์— ๋™์ผํ•œ ์ „์••์„ ์ธ๊ฐ€ํ•˜๋Š” ๊ฒƒ ๋Œ€๋น„ ์ˆ˜์œจ์€ ๊ฐ™์€ ์ˆ˜์ค€์œผ๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ SRAM ๋น„ํŠธ์…€ ๋ฐฐ์—ด์˜ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์—์„œ ๊ธฐ์กด์˜ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ๋ˆ„์„ค ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๋” ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด์˜ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•์€ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ๋ชจ๋“  ํ”Œ๋ฆฝํ”Œ๋กญ์—๋Š” ๋ฌด์กฐ๊ฑด ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ํ• ๋‹นํ•ด์•ผ ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์ค‘ ๋น„ํŠธ ๋ณด์กด์šฉ ๊ณต๊ฐ„์˜ ์žฅ์ ์„ ์ถฉ๋ถ„ํžˆ ์‚ด๋ฆฌ์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค: (1) ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๊ณผ์ •์—์„œ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๋ฅผ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด์„ ์ œ์‹œํ•˜๊ณ , (2) ํ•ด๋‹น ์กฐ๊ฑด์„ ์ด์šฉํ•ด ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝํ”Œ๋กญ์ด ๋งŽ์ด ์กด์žฌํ•˜๋Š” ํšŒ๋กœ์—์„œ ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•œ๋‹ค; (3) ์ถ”๊ฐ€๋กœ, ํ”Œ๋ฆฝํ”Œ๋กญ์— ์ด๋ฏธ ํ• ๋‹น๋œ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ์ค‘ ์ผ๋ถ€๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด์„ ์ฐพ๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ๋” ๊ฐ์†Œ์‹œํ‚จ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ๊ธฐ์กด์˜ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋ก ๋ณด๋‹ค ๋” ์ ์€ ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ํ• ๋‹นํ•˜๋ฉฐ, ๋”ฐ๋ผ์„œ ์นฉ์˜ ๋ฉด์  ๋ฐ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค.Low power operation of a chip is an important issue, and its importance is increasing as the process technology advances. This dissertation addresses the methodology of operating at low power for each of the SRAM and logic constituting the chip. Firstly, we propose a methodology to infer the minimum operating voltage at which SRAM failure does not occur in all SRAM blocks in the chip operating on near threshold voltage (NTV) regime through the measurement of a monitoring circuit. Operating the chip on NTV regime is one of the most effective ways to increase energy efficiency, but in case of SRAM, it is difficult to lower the operating voltage because of SRAM failure. However, since the process variation on each chip is different, the minimum operating voltage is also different for each chip. If it is possible to infer the minimum operating voltage of SRAM blocks of each chip through monitoring, energy efficiency can be increased by applying different voltage. In this dissertation, we propose a new methodology of resolving this problem. Specifically, (1) we propose to infer minimum operation voltage of SRAM in design infra development phase, and assign the voltage using measurement of SRAM monitor in silicon production phase; (2) we define a SRAM monitor and features to be monitored that can monitor process variation on SRAM blocks including SRAM bitcell and peripheral circuits; (3) we propose a new methodology of inferring minimum operating voltage of SRAM blocks in a chip that does not cause read, write, and access failures under a target confidence level. Through experiments with benchmark circuits, it is confirmed that applying different voltage to SRAM blocks in each chip that inferred by our proposed methodology can save overall power consumption of SRAM bitcell array compared to applying same voltage to SRAM blocks in all chips, while meeting the same yield target. Secondly, we propose a methodology to resolve the problem of the conventional retention storage allocation methods and thereby further reduce leakage power consumption of power gated circuit. Conventional retention storage allocation methods have problem of not fully utilizing the advantage of multi-bit retention storage because of the unavoidable allocation of retention storage on flip-flops with mux-feedback loop. In this dissertation, we propose a new methodology of breaking the bottleneck of minimizing the state retention storage. Specifically, (1) we find a condition that mux-feedback loop can be disregarded during the retention storage allocation; (2) utilizing the condition, we minimize the retention storage of circuits that contain many flip-flops with mux-feedback loop; (3) we find a condition to remove some of the retention storage already allocated to each of flip-flops and propose to further reduce the retention storage. Through experiments with benchmark circuits, it is confirmed that our proposed methodology allocates less retention storage compared to the state-of-the-art methods, occupying less cell area and consuming less power.1 Introduction 1 1.1 Low Voltage SRAM Monitoring Methodology 1 1.2 Retention Storage Allocation on Power Gated Circuit 5 1.3 Contributions of this Dissertation 8 2 SRAM On-Chip Monitoring Methodology for High Yield and Energy Efficient Memory Operation at Near Threshold Voltage 13 2.1 SRAM Failures 13 2.1.1 Read Failure 13 2.1.2 Write Failure 15 2.1.3 Access Failure 16 2.1.4 Hold Failure 16 2.2 SRAM On-chip Monitoring Methodology: Bitcell Variation 18 2.2.1 Overall Flow 18 2.2.2 SRAM Monitor and Monitoring Target 18 2.2.3 Vfail to Vddmin Inference 22 2.3 SRAM On-chip Monitoring Methodology: Peripheral Circuit IR Drop and Variation 29 2.3.1 Consideration of IR Drop 29 2.3.2 Consideration of Peripheral Circuit Variation 30 2.3.3 Vddmin Prediction including Access Failure Prohibition 33 2.4 Experimental Results 41 2.4.1 Vddmin Considering Read and Write Failures 42 2.4.2 Vddmin Considering Read/Write and Access Failures 45 2.4.3 Observation for Practical Use 45 3 Allocation of Always-On State Retention Storage for Power Gated Circuits - Steady State Driven Approach 49 3.1 Motivations and Analysis 49 3.1.1 Impact of Self-loop on Power Gating 49 3.1.2 Circuit Behavior Before Sleeping 52 3.1.3 Wakeup Latency vs. Retention Storage 54 3.2 Steady State Driven Retention Storage Allocation 56 3.2.1 Extracting Steady State Self-loop FFs 57 3.2.2 Allocating State Retention Storage 59 3.2.3 Designing and Optimizing Steady State Monitoring Logic 59 3.2.4 Analysis of the Impact of Steady State Monitoring Time on the Standby Power 63 3.3 Retention Storage Refinement Utilizing Steadiness 65 3.3.1 Extracting Flip-flops for Retention Storage Refinement 66 3.3.2 Designing State Monitoring Logic and Control Signals 68 3.4 Experimental Results 73 3.4.1 Comparison of State Retention Storage 75 3.4.2 Comparison of Power Consumption 79 3.4.3 Impact on Circuit Performance 82 3.4.4 Support for Immediate Power Gating 83 4 Conclusions 89 4.1 Chapter 2 89 4.2 Chapter 3 90๋ฐ•

    Methodologies for Synthesizable Programmable Devices based on Multi-Stage Switching Networks

    Get PDF
    Nowadays the rise of non-recurring engineering (NRE) costs associated with complexity is becoming a major factor in SoC design, limiting both scaling opportunities and the flexibility advantages offered by the integration of complex computational units. The introduction of embedded programmable elements can represent an appealing solution, able both to guarantee the desired flexibility and upgradabilty and to widen the SoC market. In particular embedded FPGA (eFPGA) cores can provide bit-level optimization for those applications which benefits from synthesis, paying on the other side in terms of performance penalties and area overhead with respect to standard cell ASIC implementations. In this scenario this thesis proposes a design methodology for a synthesizable programmable device designed to be embedded in a SoC. A soft-core embedded FPGA (eFPGA) is hence presented and analyzed in terms of the opportunities given by a fully synthesizable approach, following an implementation flow based on Standard-Cell methodology. A key point of the proposed eFPGA template is that it adopts a Multi-Stage Switching Network (MSSN) as the foundation of the programmable interconnects, since it can be efficiently synthesized and optimized through a standard cell based implementation flow, ensuring at the same time an intrinsic congestion-free network topology. The evaluation of the flexibility potentialities of the eFPGA has been performed using different technology libraries (STMicroelectronics CMOS 65nm and BCD9s 0.11ฮผm) through a design space exploration in terms of area-speed-leakage tradeoffs, enabled by the full synthesizability of the template. Since the most relevant disadvantage of the adopted soft approach, compared to a hardcore, is represented by a performance overhead increase, the eFPGA analysis has been made targeting small area budgets. The generation of the configuration bitstream has been obtained thanks to the implementation of a custom CAD flow environment, and has allowed functional verification and performance evaluation through an application-aware analysis

    Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs

    Get PDF
    The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies. The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification
    corecore