7,084 research outputs found

    공정변이를 고려한 3차원 집적 회로 설계 및 패키징 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김태환.As CMOS scaling down, The control of variation in chip performance (i.e. speed and power) becomes highly important to improve the chip yield. The increased variation of chip performance demands additional design efforts such as the increase of guard-band or longer design turnaround time (TAT), which cause degradation of both chip performance and economic profit. Meanwhile, through-silicon via (TSV) based 3D technology has been regarded as the promising solution for long interconnect wire and huge die size problem. Since a 3D IC is manufactured by stacking multiple dies which are fabricated in different wafers, integration of the dies that have far different process characteristic can enlarge the difference of device performance on different dies within a single chip. In this dissertation, we analyze the effect of on-package (within-chip) variation on 3D IC and presents effective methods to mitigate the onpackage variation. First, a parametric yield improvement method is presented to resolve the mismatches of dies having different process characteristic. Comprehensive 3D integration algorithms considering post-silicon tuning technique is developed for the multi-layered 3D IC. Then, we show that a careful clock edge embedding in 3D clock tree can greatly reduce the impact of on-package variation on 3D clock skew and propose a two-step solution for the problem of on-package variation-aware layer embedding in 3D clock tree synthesis. In summary, this dissertation presents effective 3D integration method and 3D clock tree synthesis algorithm for process-variation tolerant 3D IC designs.Abstract i Contents ii List of Figures iv List of Tables vii 1 Introduction 1 1.1 Process Variation in 3D ICs . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 6 2 Post-silicon Tuning Aware Die/WaferMatching Algorithms for Enhancing Parametric Yield of 3D IC Design 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 The Die-to-Die Matching Problem and Proposed Algorithm Considering Body Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Motivation and Problem Definition . . . . . . . . . . . . . . 13 2.3.2 The Proposed Die-to-Die Matching Algorithm . . . . . . . . 15 2.4 TheWafer-to-Wafer Matching Problem and Proposed Algorithm Considering Body Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Problem Definition and The Proposed Wafer-to-Wafer Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Edge Layer Embedding Algorithm for Mitigating On-Package Variation in 3D Clock Tree Synthesis 32 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Problem Definitions and Motivation . . . . . . . . . . . . . . . . . . 35 3.3 The Proposed Algorithm for On-Package Variation Aware Edge Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.1 Algorithm for Maximizing Layer Sharing of Edges . . . . . . 39 3.3.2 Refinement: Partial Edge Embedding on Layers . . . . . . . . 47 3.3.3 Clock Tree Routing and Buffer Insertion . . . . . . . . . . . . 49 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusion 64 4.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Abstract in Korean 72Docto

    Variant X-Tree Clock Distribution Network and Its Performance Evaluations

    Get PDF

    Throughput-driven floorplanning with wire pipelining

    Get PDF
    The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

    Modeling of thermally induced skew variations in clock distribution network

    Get PDF
    Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow

    Clock Distribution Network Building Algorithm For Multiple Ips In System On A Chip

    Get PDF
    In this project, an algorithm is proposed and developed to build the global CDN that is used to distribute the clocks to all partitions in the SoC using the channels available between partitions. The conventional method of building the global CDN involves manual interventions which decrease the global CDN building efficiency and increase the overall SoC design cycle. To solve this issue, an algorithm is proposed to automate the global CDN building process and at the same time obtain a balanced overall CDN not achieved by the conventional method. Other researches have proposed different CDN structures to simplify the design process but the proposals often sacrifice placement resources to achieve this. The algorithm first collects the partition clock latency numbers and other constraints needed as setup. When the setup is done, the global CDN is build and routed. The algorithm checks for clock skew and scenic routing issues before proceeding to shield the global CDN to prevent cross-talk issues. The algorithm is done when a final checking on the clock skew is done. The algorithm is tested on two different floorplans with varying size and available channels using three different clocks for each floorplan to ensure the accuracy of the algorithm. Finally, the global CDN build using the algorithm is evaluated based on the time needed to build the global CDN and the clock buffer numbers and areas used. The algorithm is shown to be able to reduce 50% of the global CDN design cycle and save 5% of clock buffer numbers and areas. The improvement achieved by the algorithm in this project shows the efficiency in designing the global CDN improved tremendously compared to conventional method

    Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

    Get PDF
    Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

    An Energy and Performance Exploration of Network-on-Chip Architectures

    Get PDF
    In this paper, we explore the designs of a circuit-switched router, a wormhole router, a quality-of-service (QoS) supporting virtual channel router and a speculative virtual channel router and accurately evaluate the energy-performance tradeoffs they offer. Power results from the designs placed and routed in a 90-nm CMOS process show that all the architectures dissipate significant idle state power. The additional energy required to route a packet through the router is then shown to be dominated by the data path. This leads to the key result that, if this trend continues, the use of more elaborate control can be justified and will not be immediately limited by the energy budget. A performance analysis also shows that dynamic resource allocation leads to the lowest network latencies, while static allocation may be used to meet QoS goals. Combining the power and performance figures then allows an energy-latency product to be calculated to judge the efficiency of each of the networks. The speculative virtual channel router was shown to have a very similar efficiency to the wormhole router, while providing a better performance, supporting its use for general purpose designs. Finally, area metrics are also presented to allow a comparison of implementation costs

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
    corecore