3,513 research outputs found

    The predictor-adaptor paradigm : automation of custom layout by flexible design

    Get PDF

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    ๋ฌผ๋ฆฌ์  ์„ค๊ณ„ ์ž๋™ํ™”์—์„œ ํ‘œ์ค€์…€ ํ•ฉ์„ฑ ๋ฐ ์ตœ์ ํ™”์™€ ์„ค๊ณ„ ํ’ˆ์งˆ ์˜ˆ์ธก ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2023. 2. ๊น€ํƒœํ™˜.In the physical design of chip implementation, designing high-quality standard cell layout and accurately predicting post-route DRV (design rule violation) at an early stage is an important problem, especially in advanced technology nodes. This dissertation presents two methodologies that can contribute to improving the design quality and design turnaround time of physical design flow. Firstly, we propose an integrated approach to the two problems of transistor folding and placement in standard cell layout synthesis. Precisely, we propose a globally optimal algorithm of search tree based design space exploration, devising a set of effective speeding up techniques as well as dynamic programming based fast cost computation. In addition, our algorithm incorporates the minimum oxide diffusion jog constraint, which closely relies on both of transistor folding and placement. Through experiments with the transistor netlists and design rules in advanced node, our proposed method is able to synthesize fully routable cell layouts of minimal size within a very fast time for each netlist, outperforming the cell layout quality in the manual design. Secondly, we propose a novel ML based DRC hotspot prediction technique, which is able to accurately capture the combined impact of pin accessibility and routing congestion on DRC hotspots. Precisely, we devise a graph, called pin proximity graph, that effectively models the spatial information on cell I/O pins and the information on pin-to-pin disturbance relation. Then, we propose a new ML model, which tightly combines GNN (graph neural network) and U-net in a way that GNN is used to embed pin accessibility information abstracted from our pin proximity graph while U-net is used to extract routing congestion information from grid-based features. Through experiments with a set of benchmark designs using advanced node, our model outperforms the existing ML models on all benchmark designs within the fast inference time in comparison with that of the state-of-the-art techniques.์นฉ ๊ตฌํ˜„์˜ ๋ฌผ๋ฆฌ์  ์„ค๊ณ„ ๋‹จ๊ณ„์—์„œ, ๋†’์€ ์„ฑ๋Šฅ์˜ ํ‘œ์ค€ ์…€ ์„ค๊ณ„์™€ ๋ฐฐ์„  ์—ฐ๊ฒฐ ์ดํ›„ ์กฐ๊ธฐ์— ์„ค๊ณ„ ๊ทœ์น™ ์œ„๋ฐ˜์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์€ ์ตœ์‹  ๊ณต์ •์—์„œ ํŠนํžˆ ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ๋ฆฌ์  ์„ค๊ณ„์—์„œ์˜ ์„ค๊ณ„ ํ’ˆ์งˆ๊ณผ ์ด ์„ค๊ณ„ ์‹œ๊ฐ„ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ‘œ์ค€ ์…€ ๋ ˆ์ด์•„์›ƒ ํ•ฉ์„ฑ์—์„œ ํŠธ๋žœ์ง€์Šคํ„ฐ ํด๋”ฉ๊ณผ ๋ฐฐ์น˜๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ํƒ์ƒ‰ ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ธฐ๋ฐ˜ ๋น ๋ฅธ ๋น„์šฉ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•๊ณผ ์—ฌ๋Ÿฌ ์†๋„ ๊ฐœ์„  ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๊ธฐ์— ๋”ํ•ด, ์ตœ์‹  ๊ณต์ •์—์„œ ํŠธ๋žœ์ง€์Šคํ„ฐ ํด๋”ฉ๊ณผ ๋ฐฐ์น˜๋กœ ์ธํ•ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œ ์‚ฐํ™”๋ฌผ ํ™•์‚ฐ ์˜์—ญ ์„ค๊ณ„ ๊ทœ์น™์„ ๊ณ ๋ คํ•˜์˜€๋‹ค. ์ตœ์‹  ๊ณต์ •์— ๋Œ€ํ•œ ํ‘œ์ค€ ์…€ ํ•ฉ์„ฑ ์‹คํ—˜ ๊ฒฐ๊ณผ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ์„ค๊ณ„ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ˆ˜๋™์œผ๋กœ ์„ค๊ณ„ํ•œ ๊ฒƒ ๋Œ€๋น„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ , ์„ค๊ณ„ ์‹œ๊ฐ„๋„ ๋งค์šฐ ์งง์Œ์„ ๋ณด์ธ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์…€ ๋ฐฐ์น˜ ๋‹จ๊ณ„์—์„œ ํ•€ ์ ‘๊ทผ์„ฑ๊ณผ ์—ฐ๊ฒฐ ํ˜ผ์žก์œผ๋กœ ์ธํ•œ ์˜ํ–ฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ๋Š” ๋จธ์‹  ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์„ค๊ณ„ ๊ทœ์น™ ์œ„๋ฐ˜ ๊ตฌ์—ญ ์˜ˆ์ธก ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ํ‘œ์ค€ ์…€์˜ ์ž…/์ถœ๋ ฅ ํ•€์˜ ๋ฌผ๋ฆฌ์  ์ •๋ณด์™€ ํ•€๊ณผ ํ•€ ์‚ฌ์ด ๋ฐฉํ•ด ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ํ•€ ๊ทผ์ ‘ ๊ทธ๋ž˜ํ”„๋ฅผ ์ œ์•ˆํ•˜๊ณ , ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง๊ณผ ์œ ๋„ท ์‹ ๊ฒฝ๋ง์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ชจ๋ธ์—์„œ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง์€ ํ•€ ๊ทผ์ ‘ ๊ทธ๋ž˜ํ”„๋กœ๋ถ€ํ„ฐ ํ•€ ์ ‘๊ทผ์„ฑ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์œ ๋„ท ์‹ ๊ฒฝ๋ง์€ ๊ฒฉ์ž ๊ธฐ๋ฐ˜ ํŠน์ง•์œผ๋กœ๋ถ€ํ„ฐ ์—ฐ๊ฒฐ ํ˜ผ์žก ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ ์ด์ „ ์—ฐ๊ตฌ๋“ค ๋Œ€๋น„ ๋” ๋น ๋ฅธ ์˜ˆ์ธก ์‹œ๊ฐ„์— ๋” ๋†’์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์„ ๋ณด์ธ๋‹ค.1 Introduction 1 1.1 Standard Cell Layout Synthesis 1 1.2 Machine Learning for Electronic Design Automation 6 1.3 Prediction of Design Rule Violation 8 1.4 Contributions of This Dissertation 11 2 Standard Cell Layout Synthesis of Advanced Nodes with Simultaneous Transistor Folding and Placement 14 2.1 Motivations 14 2.2 Algorithm for Standard Cell Layout Synthesis 16 2.2.1 Problem Definition 16 2.2.2 Overall Flow 18 2.2.3 Step 1: Generation of Folding Shapes 18 2.2.4 Step 2: Search-tree Based Design Space Exploration 20 2.2.5 Speeding up Techniques 23 2.2.6 In-cell Routability Estimation 28 2.2.7 Step 3: In-cell Routing 30 2.2.8 Step 4: Splitting Folding Shapes 35 2.2.9 Step 5: Relaxing Minimum-area Constraints 37 2.3 Experimental Results 38 2.3.1 Comparison with ASAP 7nm Cell Layouts 40 2.3.2 Effectiveness of Dynamic Folding 42 2.3.3 Effectiveness of Speeding Up Techniques 43 2.3.4 Impact of Splitting Folding Shape 48 2.3.5 Runtime Analysis According to Area Relaxation 51 2.3.6 Comparison with Previous Works 52 3 Pin Accessibility and Routing Congestion Aware DRC Hotspot Prediction using Graph Neural Network and U-Net 54 3.1 Preliminary 54 3.1.1 Graph Neural Network 54 3.1.2 Fully Convolutional Network 56 3.2 Proposed Prediction Methodology 57 3.2.1 Overall Flow 57 3.2.2 Pin Proximity Graph 58 3.2.3 Grid-based Features 61 3.2.4 Overall Architecture of PGNN 64 3.2.5 GNN Architecture in PGNN 64 3.2.6 U-net Architecture in PGNN 66 3.2.7 Final Prediction in PGNN 66 3.3 Experimental Results 68 3.3.1 Experimental Setup 68 3.3.2 Analysis on PGNN Performance 71 3.3.3 Comparison with Previous Works 72 3.3.4 Adaptation to Real-world Designs 81 3.3.5 Handling Data Imbalance Problem in Regression Model 86 4 Conclusions 92 4.1 Chapter 2 92 4.2 Chapter 3 93๋ฐ•

    Transistor-Level Layout of Integrated Circuits

    Get PDF
    In this dissertation, we present the toolchain BonnCell and its underlying algorithms. It has been developed in close cooperation with the IBM Corporation and automatically generates the geometry for functional groups of 2 to approximately 50 transistors. Its input consists of a set of transistors, including properties like their sizes and their types, a specification of their connectivity, and parameters to flexibly control the technological framework as well as the algorithms' behavior. Using this data, the tool computes a detailed geometric realization of the circuit as polygonal shapes on 16 layers. To this end, a placement routine configures the transistors and arranges them in the plane, which is the main subject of this thesis. Subsequently, a routing engine determines wires connecting the transistors to ensure the circuit's desired functionality. We propose and analyze a family of algorithms that arranges sets of transistors in the plane such that a multi-criteria target function is optimized. The primary goal is to obtain solutions that are as compact as possible because chip area is a valuable resource in modern techologies. In addition to the core algorithms we formulate variants that handle particularly structured instances in a suitable way. We will show that for 90% of the instances in a representative test bed provided by IBM, BonnCell succeeds to generate fully functional layouts including the placement of the transistors and a routing of their interconnections. Moreover, BonnCell is in wide use within IBM's groups that are concerned with transistor-level layout - a task that has been performed manually before our automation was available. Beyond the processing of isolated test cases, two large-scale examples for applications of the tool in the industry will be presented: On the one hand the initial design phase of a large SRAM unit required only half of the expected 3 month period, on the other hand BonnCell could provide valuable input aiding central decisions in the early concept phase of the new 14 nm technology generation

    GM : a gate matrix layout generator

    Get PDF

    NASA Space Engineering Research Center Symposium on VLSI Design

    Get PDF
    The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers

    FIR filter optimization for video processing on FPGAs

    Get PDF
    • โ€ฆ
    corecore