7,084 research outputs found
공정변이를 고려한 3차원 집적 회로 설계 및 패키징 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 김태환.As CMOS scaling down, The control of variation in chip performance (i.e. speed
and power) becomes highly important to improve the chip yield. The increased variation
of chip performance demands additional design efforts such as the increase of
guard-band or longer design turnaround time (TAT), which cause degradation of both
chip performance and economic profit. Meanwhile, through-silicon via (TSV) based
3D technology has been regarded as the promising solution for long interconnect wire
and huge die size problem. Since a 3D IC is manufactured by stacking multiple dies
which are fabricated in different wafers, integration of the dies that have far different
process characteristic can enlarge the difference of device performance on different
dies within a single chip. In this dissertation, we analyze the effect of on-package
(within-chip) variation on 3D IC and presents effective methods to mitigate the onpackage
variation. First, a parametric yield improvement method is presented to resolve
the mismatches of dies having different process characteristic. Comprehensive
3D integration algorithms considering post-silicon tuning technique is developed for
the multi-layered 3D IC. Then, we show that a careful clock edge embedding in 3D
clock tree can greatly reduce the impact of on-package variation on 3D clock skew
and propose a two-step solution for the problem of on-package variation-aware layer
embedding in 3D clock tree synthesis. In summary, this dissertation presents effective
3D integration method and 3D clock tree synthesis algorithm for process-variation tolerant
3D IC designs.Abstract i
Contents ii
List of Figures iv
List of Tables vii
1 Introduction 1
1.1 Process Variation in 3D ICs . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 6
2 Post-silicon Tuning Aware Die/WaferMatching Algorithms for Enhancing
Parametric Yield of 3D IC Design 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 The Die-to-Die Matching Problem and Proposed Algorithm Considering
Body Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Motivation and Problem Definition . . . . . . . . . . . . . . 13
2.3.2 The Proposed Die-to-Die Matching Algorithm . . . . . . . . 15
2.4 TheWafer-to-Wafer Matching Problem and Proposed Algorithm Considering
Body Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Problem Definition and The Proposed Wafer-to-Wafer Matching
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Edge Layer Embedding Algorithm for Mitigating On-Package Variation
in 3D Clock Tree Synthesis 32
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Problem Definitions and Motivation . . . . . . . . . . . . . . . . . . 35
3.3 The Proposed Algorithm for On-Package Variation Aware Edge Embedding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Algorithm for Maximizing Layer Sharing of Edges . . . . . . 39
3.3.2 Refinement: Partial Edge Embedding on Layers . . . . . . . . 47
3.3.3 Clock Tree Routing and Buffer Insertion . . . . . . . . . . . . 49
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Conclusion 64
4.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Abstract in Korean 72Docto
Throughput-driven floorplanning with wire pipelining
The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires
Modeling of thermally induced skew variations in clock distribution network
Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow
Clock Distribution Network Building Algorithm For Multiple Ips In System On A Chip
In this project, an algorithm is proposed and developed to build the global CDN
that is used to distribute the clocks to all partitions in the SoC using the channels available
between partitions. The conventional method of building the global CDN involves
manual interventions which decrease the global CDN building efficiency and increase the
overall SoC design cycle. To solve this issue, an algorithm is proposed to automate the
global CDN building process and at the same time obtain a balanced overall CDN not
achieved by the conventional method. Other researches have proposed different CDN
structures to simplify the design process but the proposals often sacrifice placement
resources to achieve this. The algorithm first collects the partition clock latency numbers
and other constraints needed as setup. When the setup is done, the global CDN is build
and routed. The algorithm checks for clock skew and scenic routing issues before
proceeding to shield the global CDN to prevent cross-talk issues. The algorithm is done
when a final checking on the clock skew is done. The algorithm is tested on two different
floorplans with varying size and available channels using three different clocks for each
floorplan to ensure the accuracy of the algorithm. Finally, the global CDN build using the
algorithm is evaluated based on the time needed to build the global CDN and the clock
buffer numbers and areas used. The algorithm is shown to be able to reduce 50% of the
global CDN design cycle and save 5% of clock buffer numbers and areas. The
improvement achieved by the algorithm in this project shows the efficiency in designing
the global CDN improved tremendously compared to conventional method
Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor
Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low
An Energy and Performance Exploration of Network-on-Chip Architectures
In this paper, we explore the designs of a circuit-switched router, a wormhole router, a quality-of-service (QoS) supporting virtual channel router and a speculative virtual channel router and accurately evaluate the energy-performance tradeoffs they offer. Power results from the designs placed and routed in a 90-nm CMOS process show that all the architectures dissipate significant idle state power. The additional energy required to route a packet through the router is then shown to be dominated by the data path. This leads to the key result that, if this trend continues, the use of more elaborate control can be justified and will not be immediately limited by the energy budget. A performance analysis also shows that dynamic resource allocation leads to the lowest network latencies, while static allocation may be used to meet QoS goals. Combining the power and performance figures then allows an energy-latency product to be calculated to judge the efficiency of each of the networks. The speculative virtual channel router was shown to have a very similar efficiency to the wormhole router, while providing a better performance, supporting its use for general purpose designs. Finally, area metrics are also presented to allow a comparison of implementation costs
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
- …