244 research outputs found

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

    Full text link
    CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference

    Algorithms and architectures for the multirate additive synthesis of musical tones

    Get PDF
    In classical Additive Synthesis (AS), the output signal is the sum of a large number of independently controllable sinusoidal partials. The advantages of AS for music synthesis are well known as is the high computational cost. This thesis is concerned with the computational optimisation of AS by multirate DSP techniques. In note-based music synthesis, the expected bounds of the frequency trajectory of each partial in a finite lifecycle tone determine critical time-invariant partial-specific sample rates which are lower than the conventional rate (in excess of 40kHz) resulting in computational savings. Scheduling and interpolation (to suppress quantisation noise) for many sample rates is required, leading to the concept of Multirate Additive Synthesis (MAS) where these overheads are minimised by synthesis filterbanks which quantise the set of available sample rates. Alternative AS optimisations are also appraised. It is shown that a hierarchical interpretation of the QMF filterbank preserves AS generality and permits efficient context-specific adaptation of computation to required note dynamics. Practical QMF implementation and the modifications necessary for MAS are discussed. QMF transition widths can be logically excluded from the MAS paradigm, at a cost. Therefore a novel filterbank is evaluated where transition widths are physically excluded. Benchmarking of a hypothetical orchestral synthesis application provides a tentative quantitative analysis of the performance improvement of MAS over AS. The mapping of MAS into VLSI is opened by a review of sine computation techniques. Then the functional specification and high-level design of a conceptual MAS Coprocessor (MASC) is developed which functions with high autonomy in a loosely-coupled master- slave configuration with a Host CPU which executes filterbanks in software. Standard hardware optimisation techniques are used, such as pipelining, based upon the principle of an application-specific memory hierarchy which maximises MASC throughput

    Stochastic-Based Computing with Emerging Spin-Based Device Technologies

    Get PDF
    In this dissertation, analog and emerging device physics is explored to provide a technology platform to design new bio-inspired system and novel architecture. With CMOS approaching the nano-scaling, their physics limits in feature size. Therefore, their physical device characteristics will pose severe challenges to constructing robust digital circuitry. Unlike transistor defects due to fabrication imperfection, quantum-related switching uncertainties will seriously increase their susceptibility to noise, thus rendering the traditional thinking and logic design techniques inadequate. Therefore, the trend of current research objectives is to create a non-Boolean high-level computational model and map it directly to the unique operational properties of new, power efficient, nanoscale devices. The focus of this research is based on two-fold: 1) Investigation of the physical hysteresis switching behaviors of domain wall device. We analyze phenomenon of domain wall device and identify hysteresis behavior with current range. We proposed the Domain-Wall-Motion-based (DWM) NCL circuit that achieves approximately 30x and 8x improvements in energy efficiency and chip layout area, respectively, over its equivalent CMOS design, while maintaining similar delay performance for a one bit full adder. 2) Investigation of the physical stochastic switching behaviors of Mag- netic Tunnel Junction (MTJ) device. With analyzing of stochastic switching behaviors of MTJ, we proposed an innovative stochastic-based architecture for implementing artificial neural network (S-ANN) with both magnetic tunneling junction (MTJ) and domain wall motion (DWM) devices, which enables efficient computing at an ultra-low voltage. For a well-known pattern recognition task, our mixed-model HSPICE simulation results have shown that a 34-neuron S-ANN implementation, when compared with its deterministic-based ANN counterparts implemented with digital and analog CMOS circuits, achieves more than 1.5 ~ 2 orders of magnitude lower energy consumption and 2 ~ 2.5 orders of magnitude less hidden layer chip area

    ์ดˆ๋ฏธ์„ธ ํšŒ๋กœ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์˜ˆ์ธก

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ๊น€ํƒœํ™˜.ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์ œ๊ฑฐ๋Š” ๋ฐ˜๋„์ฒด ์นฉ ์ œ์กฐ๋ฅผ ์œ„ํ•œ ๋งˆ์Šคํฌ ์ œ์ž‘ ์ „์— ์™„๋ฃŒ๋˜์–ด์•ผ ํ•  ํ•„์ˆ˜ ๊ณผ์ •์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํŠธ๋žœ์ง€์Šคํ„ฐ์™€ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ๋ณ€์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ  ๋””์ž์ธ ๋ฃฐ ์—ญ์‹œ ๋ณต์žกํ•ด์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์ œ๊ฑฐ๋Š” ์ดˆ๋ฏธ์„ธ ํšŒ๋กœ์—์„œ ๋” ์–ด๋ ค์›Œ์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ดˆ๋ฏธ์„ธ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋‘๊ฐ€์ง€ ๋ฌธ์ œ์ธ ํƒ€์ด๋ฐ ๋ถ„์„๊ณผ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ ๊ณต์ • ์ฝ”๋„ˆ์—์„œ ํƒ€์ด๋ฐ ๋ถ„์„์€ ์‹ค๋ฆฌ์ฝ˜์œผ๋กœ ์ œ์ž‘๋œ ํšŒ๋กœ์˜ ์„ฑ๋Šฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๊ณต์ • ์ฝ”๋„ˆ์—์„œ ๊ฐ€์žฅ ๋Š๋ฆฐ ํƒ€์ด๋ฐ ๊ฒฝ๋กœ๊ฐ€ ๋ชจ๋“  ๊ณต์ • ์กฐ๊ฑด์—์„œ๋„ ๊ฐ€์žฅ ๋Š๋ฆฐ ๊ฒƒ์€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์นฉ ๋‚ด์˜ ์ž„๊ณ„ ๊ฒฝ๋กœ์—์„œ ์ธํ„ฐ์ปค๋„ฅํŠธ์— ์˜ํ•œ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ „์ฒด ์ง€์—ฐ ์‹œ๊ฐ„์—์„œ์˜ ์˜ํ–ฅ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ , 10๋‚˜๋…ธ ์ดํ•˜ ๊ณต์ •์—์„œ๋Š” 20%๋ฅผ ์ดˆ๊ณผํ•˜๊ณ  ์žˆ๋‹ค. ์ฆ‰, ์‹ค๋ฆฌ์ฝ˜์œผ๋กœ ์ œ์ž‘๋œ ํšŒ๋กœ์˜ ์„ฑ๋Šฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋Œ€ํ‘œ ํšŒ๋กœ๊ฐ€ ํŠธ๋žœ์ง€์Šคํ„ฐ์˜ ๋ณ€์ด ๋ฟ๋งŒ์•„๋‹ˆ๋ผ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ๋ณ€์ด๋„ ๋ฐ˜์˜ํ•ด์•ผํ•œ๋‹ค. ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ธˆ์†์ด 10์ธต ์ด์ƒ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๊ณ , ๊ฐ ์ธต์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ธˆ์†์˜ ์ €ํ•ญ๊ณผ ์บํŒจ์‹œํ„ด์Šค์™€ ๋น„์•„ ์ €ํ•ญ์ด ๋ชจ๋‘ ํšŒ๋กœ ์ง€์—ฐ ์‹œ๊ฐ„์— ์˜ํ–ฅ์„ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€ํ‘œ ํšŒ๋กœ๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ๋Š” ์ฐจ์›์ด ๋งค์šฐ ๋†’์€ ์˜์—ญ์—์„œ ์ตœ์ ์˜ ํ•ด๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ์ œ์ž‘ํ•˜๋Š” ๊ณต์ •(๋ฐฑ ์—”๋“œ ์˜ค๋ธŒ ๋ผ์ธ)์˜ ๋ณ€์ด๋ฅผ ๋ฐ˜์˜ํ•œ ๋Œ€ํ‘œ ํšŒ๋กœ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ณต์ • ๋ณ€์ด๊ฐ€ ์—†์„๋•Œ ๊ฐ€์žฅ ๋Š๋ฆฐ ํƒ€์ด๋ฐ ๊ฒฝ๋กœ์— ์‚ฌ์šฉ๋œ ๊ฒŒ์ดํŠธ์™€ ๋ผ์šฐํŒ… ํŒจํ„ด์„ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ์ ์ง„์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ํ•ฉ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ์˜ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ๋“ค์„ ํ†ตํ•ฉํ•˜์˜€๋‹ค: (1) ๋ผ์šฐํŒ…์„ ๊ตฌ์„ฑํ•˜๋Š” ์—ฌ๋Ÿฌ ๊ธˆ์† ์ธต๊ณผ ๋น„์•„๋ฅผ ์ถ”์ถœํ•˜๊ณ  ํƒ์ƒ‰ ์‹œ๊ฐ„ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ๋“ค์„ ๊ฐ™์€ ๋ฒ”์ฃผ๋กœ ๋ถ„๋ฅ˜ํ•˜์˜€๋‹ค. (2) ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ์œ„ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ธˆ์† ์ธต๊ณผ ๋น„์•„๋“ค์˜ ๋ณ€์ด๋ฅผ ์ˆ˜์‹ํ™”ํ•˜์˜€๋‹ค. (3) ํ™•์žฅ์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ๋ง ์˜ค์‹ค๋ ˆ์ดํ„ฐ๋กœ ๋Œ€ํ‘œํšŒ๋กœ๋ฅผ ํƒ์ƒ‰ํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ ๋””์ž์ธ ๋ฃฐ์˜ ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ , ์ด๋กœ ์ธํ•ด ํ‘œ์ค€ ์…€๋“ค์˜ ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ํ†ตํ•œ ์—ฐ๊ฒฐ์„ ์ง„ํ–‰ํ•˜๋Š” ๋™์•ˆ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ํ‘œ์ค€ ์…€์˜ ํฌ๊ธฐ๊ฐ€ ๊ณ„์† ์ž‘์•„์ง€๋ฉด์„œ ์…€๋“ค์˜ ์—ฐ๊ฒฐ์€ ์ ์  ์–ด๋ ค์›Œ์ง€๊ณ  ์žˆ๋‹ค. ๊ธฐ์กด์—๋Š” ํšŒ๋กœ ๋‚ด ๋ชจ๋“  ํ‘œ์ค€ ์…€์„ ์—ฐ๊ฒฐํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŠธ๋ž™ ์ˆ˜, ๊ฐ€๋Šฅํ•œ ํŠธ๋ž™ ์ˆ˜, ์ด๋“ค ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ์„ฑ์„ ํŒ๋‹จํ•˜๊ณ , ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋„๋ก ์…€ ๋ฐฐ์น˜๋ฅผ ์ตœ์ ํ™”ํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด ๋ฐฉ๋ฒ•์€ ์ตœ์‹  ๊ณต์ •์—์„œ๋Š” ์ •ํ™•ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ํšŒ๋กœ๋‚ด ๋ชจ๋“  ํ‘œ์ค€ ์…€ ์‚ฌ์ด์˜ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ๊ณ„ ํ•™์Šต์„ ํ†ตํ•ด ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ๋ฐœ์ƒํ•˜๋Š” ์˜์—ญ ๋ฐ ๊ฐœ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์ด๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ‘œ์ค€ ์…€์˜ ๋ฐฐ์น˜๋ฅผ ๋ฐ”๊พธ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์˜์—ญ์€ ์ด์ง„ ๋ถ„๋ฅ˜๋กœ ์˜ˆ์ธกํ•˜์˜€๊ณ  ํ‘œ์ค€ ์…€์˜ ๋ฐฐ์น˜๋Š” ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ๊ฐœ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ์˜ ์„ธ๊ฐ€์ง€ ๊ธฐ์ˆ ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค: (1) ํšŒ๋กœ ๋ ˆ์ด์•„์›ƒ์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ •์‚ฌ๊ฐํ˜• ๊ฒฉ์ž๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ๊ฒฉ์ž์—์„œ ๋ผ์šฐํŒ…์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ์š”์†Œ๋“ค์„ ์ถ”์ถœํ•œ๋‹ค. (2) ๊ฐ ๊ฒฉ์ž์—์„œ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” ์ด์ง„ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. (3) ๋ฉ”ํƒ€ํœด๋ฆฌ์Šคํ‹ฑ ์ตœ์ ํ™” ๋˜๋Š” ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™”๋ฅผ ์ด์šฉํ•˜์—ฌ ์ „์ฒด ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ๊ฐœ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜๋„๋ก ๊ฐ ๊ฒฉ์ž์— ์žˆ๋Š” ํ‘œ์ค€ ์…€์„ ์›€์ง์ธ๋‹ค.Timing analysis and clearing design rule violations are the essential steps for taping out a chip. However, they keep getting harder in deep sub-micron circuits because the variations of transistors and interconnects have been increasing and design rules have become more complex. This dissertation addresses two problems on timing analysis and design rule violations for synthesizing deep sub-micron circuits. Firstly, timing analysis in process corners can not capture post-Si performance accurately because the slowest path in the process corner is not always the slowest one in the post-Si instances. In addition, the proportion of interconnect delay in the critical path on a chip is increasing and becomes over 20% in sub-10nm technologies, which means in order to capture post-Si performance accurately, the representative critical path circuit should reflect not only FEOL (front-end-of-line) but also BEOL (backend-of-line) variations. Since the number of BEOL metal layers exceeds ten and the layers have variation on resistance and capacitance intermixed with resistance variation on vias between them, a very high dimensional design space exploration is necessary to synthesize a representative critical path circuit which is able to provide an accurate performance prediction. To cope with this, I propose a BEOL-aware methodology of synthesizing a representative critical path circuit, which is able to incrementally explore, starting from an initial path circuit on the post-Si target circuit, routing patterns (i.e., BEOL reconfiguring) as well as gate resizing on the path circuit. Precisely, the synthesis framework of critical path circuit integrates a set of novel techniques: (1) extracting and classifying BEOL configurations for lightening design space complexity, (2) formulating BEOL random variables for fast and accurate timing analysis, and (3) exploring alternative (ring oscillator) circuit structures for extending the applicability of this work. Secondly, the complexity of design rules has been increasing and results in more design rule violations during routing. In addition, the size of standard cell keeps decreasing and it makes routing harder. In the conventional P&R flow, the routability of pre-routed layout is predicted by routing congestion obtained from global routing, and then placement is optimized not to cause design rule violations. But it turned out to be inaccurate in advanced technology nodes so that it is necessary to predict routability with more features. I propose a methodology of predicting the hotspots of design rule violations (DRVs) using machine learning with placement related features and the conventional routing congestion, and perturbating placed cells to reduce the number of DRVs. Precisely, the hotspots are predicted by a pre-trained binary classification model and placement perturbation is performed by global optimization methods to minimize the number of DRVs predicted by a pre-trained regression model. To do this, the framework is composed of three techniques: (1) dividing the circuit layout into multiple rectangular grids and extracting features such as pin density, cell density, global routing results (demand, capacity and overflow), and more in the placement phase, (2) predicting if each grid has DRVs using a binary classification model, and (3) perturbating the placed standard cells in the hotspots to minimize the number of DRVs predicted by a regression model.1 Introduction 1 1.1 Representative Critical Path Circuit . . . . . . . . . . . . . . . . . . . 1 1.2 Prediction of Design Rule Violations and Placement Perturbation . . . 5 1.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 7 2 Methodology for Synthesizing Representative Critical Path Circuits reflecting BEOL Timing Variation 9 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Definitions and Overall Flow . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Techniques for BEOL-Aware RCP Generation . . . . . . . . . . . . . 17 2.3.1 Clustering BEOL Configurations . . . . . . . . . . . . . . . . 17 2.3.2 Formulating Statistical BEOL Random Variables . . . . . . . 18 2.3.3 Delay Modeling . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Exploring Ring Oscillator Circuit Structures . . . . . . . . . . 24 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Further Study on Variations . . . . . . . . . . . . . . . . . . . . . . . 37 3 Methodology for Reducing Routing Failures through Enhanced Prediction on Design Rule Violations in Placement 39 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Techniques for Reducing Routing Failures . . . . . . . . . . . . . . . 43 3.3.1 Binary Classification . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 47 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.2 Hotspot Prediction . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusions 61 4.1 Synthesis of Representative Critical Path Circuits reflecting BEOL Timing Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Reduction of Routing Failures through Enhanced Prediction on Design Rule Violations in Placement . . . . . . . . . . . . . . . . . . . . . . 62 Abstract (In Korean) 69Docto

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Advanced gate stacks for nano-scale CMOS technology

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    • โ€ฆ
    corecore