27 research outputs found

    Beyond the arithmetic constraint: depth-optimal mapping of logic chains in reconfigurable fabrics

    Get PDF
    Look-up table based FPGAs have migrated from a niche technology for design prototyping to a valuable end-product component and, in some cases, a replacement for general purpose processors and ASICs alike. One way architects have bridged the performance gap between FPGAs and ASICs is through the inclusion of specialized components such as multipliers, RAM modules, and microcontrollers. Another dedicated structure that has become standard in reconfigurable fabrics is the arithmetic carry chain. Currently, it is only used to map arithmetic operations as identified by HDL macros. For non-arithmetic operations, it is an idle but potentially powerful resource.;Obstacles to using the carry chain for generic logic operations include lack of architectural and computer-aided design support. Current carry-select architectures facilitate carry chain reuse, although they do so only for (K-1)-input operations. Additionally, hardware description language (HDL) macros are the only recourse for a designer wishing to map generic logic chains in a carry-select architecture. A novel architecture that allows the full K-input operational capacity of the carry chain to be harnessed is presented as a solution to current architectural limitations. It is shown to have negligible impact on logic element area and delay. Using only two additional 2:1 pass transistor multiplexers, it enables the transmission of a K-input operation to the carry chain and general routing simultaneously. To successfully identify logic chains in an arbitrary Boolean network, ChainMap is presented as a novel technology mapping algorithm. ChainMap creates delay-optimal generic logic chains in polynomial time without HDL macros. It maps both arithmetic and non-arithmetic logic chains whenever depth increasing nodes, which increase logic depth but not routing depth, are encountered. Use of the chain is not reserved for arithmetic, but rather any set of gates exhibiting similar characteristics. By using the carry chain as a generic, near zero-delay adjacent cell interconnection structure a potential average optimal speedup of 1.4x is revealed. Post place and route experiments indicate that ChainMap solutions perform similarly to HDL chains when cluster resources are abundant and significantly better in cluster-constrained arrays

    Handling the complexity of routing problem in modern VLSI design

    Get PDF
    In VLSI physical design, the routing task consists of using over-the-cell metal wires to connect pins and ports of circuit gates and blocks. Traditionally, VLSI routing is an important design step in the sense that the quality of routing solution has great impact on various design metrics such as circuit timing, power consumption, chip reliability and manufacturability etc. As the advancing VLSI design enters the nanometer era, the routing success (routability issue) has been arising as one of the most critical problems in back-end design. In one aspect, the degree of design complexity is increasing dramatically as more and more modules are integrated into the chip. Much higher chip density leads to higher routing demands and potentially more risks in routing failure. In another aspect, with decreasing design feature size, there are more complex design rules imposed to ensure manufacturability. These design rules are hard to satisfy and they usually create more barriers for achieving routing closure (i.e., generate DRC free routing solution) and thus affect chip time to market (TTM) plan. In general, the behavior and performance of routing are affected by three consecutive phases: placement phase, global routing phase and detailed routing phase in a typical VLSI physical design flow. Traditional CAD tools handle each of the three phases independently and the global picture of the routability issue is neglected. Different from conventional approaches which propose tools and algorithms for one particular design phase, this thesis investigates the routability issue from all three phases and proposes a series of systematic solutions to build a more generic flow and improve quality of results (QoR). For the placement phase, we will introduce a mixed-sized placement refinement tool for alleviating congestion after placement. The tool shifts and relocates modules based on a global routing estimation. For the global routing phase, a very fast and effective global router is developed. Its performance surpasses many peer works as verified by ISPD 2008 global routing contest results. In the detailed routing phase, a tool is proposed to perform detailed routing using regular routing patterns based on a correct-by-construction methodology to improve routability as well as satisfy most design rules. Finally, the tool which integrates global routing and detailed routing is developed to remedy the inconsistency between global routing and detailed routing. To verify the algorithms we proposed, three sets of testcases derived from ISPD98 and ISPD05/06 placement benchmark suites are proposed. The results indicate that our proposed methods construct an integrated and systematic flow for routability improvement which is better than conventional methods

    Lexicographic path searches for FPGA routing

    Full text link
    This dissertation reports on studies of the application of lexicographic graph searches to solve problems in FPGA detailed routing. Our contributions include the derivation of iteration limits for scalar implementations of negotiation congestion for standard floating point types and the identification of pathological cases for path choice. In the study of the routability-driven detailed FPGA routing problem, we show universal detailed routability is NP-complete based on a related proof by Lee and Wong. We describe the design of a lexicographic composition operator of totally-ordered monoids as path cost metrics and show its optimality under an adapted A* search. Our new router, CornNC, based on lexicographic composition of congestion and wirelength, established a new minimum track count for the FPGA Place and Route Challenge. For the problem of long-path timing-driven FPGA detailed routing, we show that long-path budgeted detailed routability is NP-complete by reduction to universal detailed routability. We generalise the lexicographic composition to any finite length and verify its optimality under A* search. The application of the timing budget solution of Ghiasi et al. is used to solve the long-path timing budget problem for FPGA connections. Our delay-clamped spiral lexicographic composition design, SpiralRoute, ensures connection based budgets are always met, thus achieves timing closure when it successfully routes. For 113 test routing instances derived from standard benchmarks, SpiralRoute found 13 routable instances with timing closure that were unroutable by a scalar negotiated congestion router and achieved timing closure in another 27 cases when the scalar router did not, at the expense of increased runtime. We also study techniques to improve SpiralRoute runtimes, including a data structure of a trie augmented by data stacks for minimum element retrieval, and the technique of step tomonoid elimination in reducing the retrieval depth in a trie of stacks structure

    An adaptive method to tolerate soft errors in SRAM-based FPGAs

    Get PDF
    AbstractIn this paper, we present an adaptive method that is a combination of SEU-avoidance in CAD flow and adaptive redundancy to tolerate soft error effects in SRAM-based FPGAs. This method is based on the modification of T-VPack and VPR tools. Three different steps of these tools are modified for SEU-awareness: (1) clustering, (2) placement and (3) routing. Then we use the unused resources as redundancy. We have investigated the effect of this method on several MCNC benchmarks. This investigation has been performed using three experiments: (1) SEU-awareness in clustering with redundancy, (2) SEU-awareness in clustering and placement with redundancy and (3) SEU-awareness in clustering, placement and routing with redundancy. With a confidence level of 95%, the results show that, using each of these three experiments, the system failure rate of ten MCNC circuits has been decreased between 4.52% and 10.42%, between 10.25% and 21.63%, and between 10.48% and 24.39%, respectively

    ๋ฌผ๋ฆฌ์  ์„ค๊ณ„ ์ž๋™ํ™”์—์„œ ํ‘œ์ค€์…€ ํ•ฉ์„ฑ ๋ฐ ์ตœ์ ํ™”์™€ ์„ค๊ณ„ ํ’ˆ์งˆ ์˜ˆ์ธก ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2023. 2. ๊น€ํƒœํ™˜.In the physical design of chip implementation, designing high-quality standard cell layout and accurately predicting post-route DRV (design rule violation) at an early stage is an important problem, especially in advanced technology nodes. This dissertation presents two methodologies that can contribute to improving the design quality and design turnaround time of physical design flow. Firstly, we propose an integrated approach to the two problems of transistor folding and placement in standard cell layout synthesis. Precisely, we propose a globally optimal algorithm of search tree based design space exploration, devising a set of effective speeding up techniques as well as dynamic programming based fast cost computation. In addition, our algorithm incorporates the minimum oxide diffusion jog constraint, which closely relies on both of transistor folding and placement. Through experiments with the transistor netlists and design rules in advanced node, our proposed method is able to synthesize fully routable cell layouts of minimal size within a very fast time for each netlist, outperforming the cell layout quality in the manual design. Secondly, we propose a novel ML based DRC hotspot prediction technique, which is able to accurately capture the combined impact of pin accessibility and routing congestion on DRC hotspots. Precisely, we devise a graph, called pin proximity graph, that effectively models the spatial information on cell I/O pins and the information on pin-to-pin disturbance relation. Then, we propose a new ML model, which tightly combines GNN (graph neural network) and U-net in a way that GNN is used to embed pin accessibility information abstracted from our pin proximity graph while U-net is used to extract routing congestion information from grid-based features. Through experiments with a set of benchmark designs using advanced node, our model outperforms the existing ML models on all benchmark designs within the fast inference time in comparison with that of the state-of-the-art techniques.์นฉ ๊ตฌํ˜„์˜ ๋ฌผ๋ฆฌ์  ์„ค๊ณ„ ๋‹จ๊ณ„์—์„œ, ๋†’์€ ์„ฑ๋Šฅ์˜ ํ‘œ์ค€ ์…€ ์„ค๊ณ„์™€ ๋ฐฐ์„  ์—ฐ๊ฒฐ ์ดํ›„ ์กฐ๊ธฐ์— ์„ค๊ณ„ ๊ทœ์น™ ์œ„๋ฐ˜์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์€ ์ตœ์‹  ๊ณต์ •์—์„œ ํŠนํžˆ ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ๋ฆฌ์  ์„ค๊ณ„์—์„œ์˜ ์„ค๊ณ„ ํ’ˆ์งˆ๊ณผ ์ด ์„ค๊ณ„ ์‹œ๊ฐ„ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ‘œ์ค€ ์…€ ๋ ˆ์ด์•„์›ƒ ํ•ฉ์„ฑ์—์„œ ํŠธ๋žœ์ง€์Šคํ„ฐ ํด๋”ฉ๊ณผ ๋ฐฐ์น˜๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ํƒ์ƒ‰ ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ธฐ๋ฐ˜ ๋น ๋ฅธ ๋น„์šฉ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•๊ณผ ์—ฌ๋Ÿฌ ์†๋„ ๊ฐœ์„  ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๊ธฐ์— ๋”ํ•ด, ์ตœ์‹  ๊ณต์ •์—์„œ ํŠธ๋žœ์ง€์Šคํ„ฐ ํด๋”ฉ๊ณผ ๋ฐฐ์น˜๋กœ ์ธํ•ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œ ์‚ฐํ™”๋ฌผ ํ™•์‚ฐ ์˜์—ญ ์„ค๊ณ„ ๊ทœ์น™์„ ๊ณ ๋ คํ•˜์˜€๋‹ค. ์ตœ์‹  ๊ณต์ •์— ๋Œ€ํ•œ ํ‘œ์ค€ ์…€ ํ•ฉ์„ฑ ์‹คํ—˜ ๊ฒฐ๊ณผ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ์„ค๊ณ„ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ˆ˜๋™์œผ๋กœ ์„ค๊ณ„ํ•œ ๊ฒƒ ๋Œ€๋น„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ , ์„ค๊ณ„ ์‹œ๊ฐ„๋„ ๋งค์šฐ ์งง์Œ์„ ๋ณด์ธ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์…€ ๋ฐฐ์น˜ ๋‹จ๊ณ„์—์„œ ํ•€ ์ ‘๊ทผ์„ฑ๊ณผ ์—ฐ๊ฒฐ ํ˜ผ์žก์œผ๋กœ ์ธํ•œ ์˜ํ–ฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ๋Š” ๋จธ์‹  ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์„ค๊ณ„ ๊ทœ์น™ ์œ„๋ฐ˜ ๊ตฌ์—ญ ์˜ˆ์ธก ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ํ‘œ์ค€ ์…€์˜ ์ž…/์ถœ๋ ฅ ํ•€์˜ ๋ฌผ๋ฆฌ์  ์ •๋ณด์™€ ํ•€๊ณผ ํ•€ ์‚ฌ์ด ๋ฐฉํ•ด ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ํ•€ ๊ทผ์ ‘ ๊ทธ๋ž˜ํ”„๋ฅผ ์ œ์•ˆํ•˜๊ณ , ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง๊ณผ ์œ ๋„ท ์‹ ๊ฒฝ๋ง์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ชจ๋ธ์—์„œ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง์€ ํ•€ ๊ทผ์ ‘ ๊ทธ๋ž˜ํ”„๋กœ๋ถ€ํ„ฐ ํ•€ ์ ‘๊ทผ์„ฑ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์œ ๋„ท ์‹ ๊ฒฝ๋ง์€ ๊ฒฉ์ž ๊ธฐ๋ฐ˜ ํŠน์ง•์œผ๋กœ๋ถ€ํ„ฐ ์—ฐ๊ฒฐ ํ˜ผ์žก ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ ์ด์ „ ์—ฐ๊ตฌ๋“ค ๋Œ€๋น„ ๋” ๋น ๋ฅธ ์˜ˆ์ธก ์‹œ๊ฐ„์— ๋” ๋†’์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์„ ๋ณด์ธ๋‹ค.1 Introduction 1 1.1 Standard Cell Layout Synthesis 1 1.2 Machine Learning for Electronic Design Automation 6 1.3 Prediction of Design Rule Violation 8 1.4 Contributions of This Dissertation 11 2 Standard Cell Layout Synthesis of Advanced Nodes with Simultaneous Transistor Folding and Placement 14 2.1 Motivations 14 2.2 Algorithm for Standard Cell Layout Synthesis 16 2.2.1 Problem Definition 16 2.2.2 Overall Flow 18 2.2.3 Step 1: Generation of Folding Shapes 18 2.2.4 Step 2: Search-tree Based Design Space Exploration 20 2.2.5 Speeding up Techniques 23 2.2.6 In-cell Routability Estimation 28 2.2.7 Step 3: In-cell Routing 30 2.2.8 Step 4: Splitting Folding Shapes 35 2.2.9 Step 5: Relaxing Minimum-area Constraints 37 2.3 Experimental Results 38 2.3.1 Comparison with ASAP 7nm Cell Layouts 40 2.3.2 Effectiveness of Dynamic Folding 42 2.3.3 Effectiveness of Speeding Up Techniques 43 2.3.4 Impact of Splitting Folding Shape 48 2.3.5 Runtime Analysis According to Area Relaxation 51 2.3.6 Comparison with Previous Works 52 3 Pin Accessibility and Routing Congestion Aware DRC Hotspot Prediction using Graph Neural Network and U-Net 54 3.1 Preliminary 54 3.1.1 Graph Neural Network 54 3.1.2 Fully Convolutional Network 56 3.2 Proposed Prediction Methodology 57 3.2.1 Overall Flow 57 3.2.2 Pin Proximity Graph 58 3.2.3 Grid-based Features 61 3.2.4 Overall Architecture of PGNN 64 3.2.5 GNN Architecture in PGNN 64 3.2.6 U-net Architecture in PGNN 66 3.2.7 Final Prediction in PGNN 66 3.3 Experimental Results 68 3.3.1 Experimental Setup 68 3.3.2 Analysis on PGNN Performance 71 3.3.3 Comparison with Previous Works 72 3.3.4 Adaptation to Real-world Designs 81 3.3.5 Handling Data Imbalance Problem in Regression Model 86 4 Conclusions 92 4.1 Chapter 2 92 4.2 Chapter 3 93๋ฐ•

    A complete design path for the layout of flexible macros

    Get PDF
    XIV+172hlm.;24c

    Performance Comparison of Static CMOS and Domino Logic Style in VLSI Design: A Review

    Get PDF
    Of late, there is a steep rise in the usage of handheld gadgets and high speed applications. VLSI designers often choose static CMOS logic style for low power applications. This logic style provides low power dissipation and is free from signal noise integrity issues. However, designs based on this logic style often are slow and cannot be used in high performance circuits. On the other hand designs based on Domino logic style yield high performance and occupy less area. Yet, they have more power dissipation compared to their static CMOS counterparts. As a practice, designers during circuit synthesis, mix more than one logic style judiciously to obtain the advantages of each logic style. Carefully designing a mixed static Domino CMOS circuit can tap the advantages of both static and Domino logic styles overcoming their own short comings
    corecore