8 research outputs found

    ์˜จ ์นฉ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„: ๋งคํ•‘, ๊ด€๋ฆฌ, ๋ผ์šฐํŒ…

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2016. 2. ์ตœ๊ธฐ์˜.์ง€๋‚œ ์ˆ˜์‹ญ ๋…„๊ฐ„ ์ด์–ด์ง„ ๋ฐ˜๋„์ฒด ๊ธฐ์ˆ ์˜ ํ–ฅ์ƒ์€ ๋งค๋‹ˆ ์ฝ”์–ด์˜ ์‹œ๋Œ€๋ฅผ ๊ฐ€์ ธ๋‹ค ์ฃผ์—ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ผ์ƒ ์ƒํ™œ์— ์“ฐ๋Š” ๋ฐ์Šคํฌํ†ฑ ์ปดํ“จํ„ฐ์กฐ์ฐจ๋„ ์ด๋ฏธ ์ˆ˜ ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ€์ง„ ์นฉ๋„ ์ƒ์šฉํ™”๋˜์–ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งŽ์€ ์ฝ”์–ด๋“ค ๊ฐ„์˜ ํ†ต์‹  ๊ธฐ๋ฐ˜์œผ๋กœ์„œ, ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ(NoC)์ด ์ƒˆ๋กœ์ด ๋Œ€๋‘๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Š” ํ˜„์žฌ ๋งŽ์€ ์—ฐ๊ตฌ ๋ฐ ์ƒ์šฉ ์ œํ’ˆ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ์„ ๋งค๋‹ˆ ์ฝ”์–ด ์‹œ์Šคํ…œ์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ๋”ฐ๋ฅด๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ ์ค‘ ๋ช‡ ๊ฐ€์ง€๋ฅผ ํ’€์–ด๋‚ด๊ณ ์ž ํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋‘ ๋ฒˆ์งธ ์ฑ•ํ„ฐ์—์„œ๋Š” NoC ๊ธฐ๋ฐ˜ ๋งค๋‹ˆ์ฝ”์–ด ๊ตฌ์กฐ์— ์ž‘์—…์„ ํ• ๋‹นํ•˜๊ณ  ์Šค์ผ€์ฅดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ์—ˆ๋‹ค. ๋งค๋‹ˆ์ฝ”์–ด์—์˜ ์ž‘์—… ํ• ๋‹น์„ ๋‹ค๋ฃฌ ๋…ผ๋ฌธ์€ ์ด๋ฏธ ๋งŽ์ด ์ถœํŒ๋˜์—ˆ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฉ”์‹œ์ง€ ํŒจ์‹ฑ๊ณผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ, ๋‘ ๊ฐ€์ง€์˜ ํ†ต์‹  ๋ฐฉ์‹์„ ๊ณ ๋ คํ•จ์œผ๋กœ์จ ์„ฑ๋Šฅ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๊ฐœ์„ ํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ๋Š” ์—ญ๋ฐฉํ–ฅ ์˜์กด์„ฑ์„ ๊ฐ€์ง„ ์ž‘์—… ๊ทธ๋ž˜ํ”„๋ฅผ ์Šค์ผ€์ฅดํ•˜๋Š” ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์‹œํ•˜์˜€๋‹ค. 3์ฐจ์› ์ ์ธต ๊ธฐ์ˆ ์€ ๋†’์•„์ง„ ์ „๋ ฅ ๋ฐ€๋„ ๋•Œ๋ฌธ์— ์—ด ๋ฌธ์ œ๊ฐ€ ์‹ฌ๊ฐํ•ด์ง€๋Š” ๋“ฑ, ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋„์ „ ๊ณผ์ œ๋ฅผ ๋‚ดํฌํ•˜๊ณ  ์žˆ๋‹ค. ์„ธ ๋ฒˆ์งธ ์ฑ•ํ„ฐ์—์„œ๋Š” DVFS ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์—ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ธฐ์ˆ ์„ ์†Œ๊ฐœํ•œ๋‹ค. ๊ฐ ์ฝ”์–ด์™€ ๋ผ์šฐํ„ฐ๊ฐ€ ์ „์••, ์ž‘๋™ ์†๋„๋ฅผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ์—์„œ, ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ์ด๋Œ์–ด ๋‚ด๋ฉด์„œ๋„ ์ตœ๋Œ€ ์˜จ๋„๋ฅผ ๋„˜์–ด์„œ์ง€ ์•Š๋„๋ก ํ•œ๋‹ค. ์„ธ ๋ฒˆ์งธ์™€ ๋„ค ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅธ ์ธก๋ฉด์„ ๋‹ค๋ฃฌ๋‹ค. 3D ์ ์ธต ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•  ๋•Œ, ์ธต๊ฐ„ ํ†ต์‹ ์€ ์ฃผ๋กœ TSV๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ฃจ์–ด์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ TSV๋Š” ์ผ๋ฐ˜ wire๋ณด๋‹ค ํ›จ์”ฌ ํฐ ๋ฉด์ ์„ ์ฐจ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ „์ฒด ๋„คํŠธ์›Œํฌ์—์„œ์˜ TSV ๊ฐœ์ˆ˜๋Š” ์ œํ•œ๋˜์–ด์•ผ ํ•  ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋‘ ๊ฐ€์ง€ ์„ ํƒ์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ์ฒซ์งธ๋Š” ๊ฐ ์ธต๊ฐ„ ํ†ต์‹  ์ฑ„๋„์˜ ๋Œ€์—ญํญ์„ ์ค„์ด๋Š” ๊ฒƒ์ด๊ณ , ๋‘˜์งธ๋Š” ๊ฐ ์ฑ„๋„์˜ ๋Œ€์—ญํญ์€ ์œ ์ง€ํ•˜๋˜ ์ผ๋ถ€ ๋…ธ๋“œ๋งŒ ์ธต๊ฐ„ ํ†ต์‹ ์ด ๊ฐ€๋Šฅํ•œ ์ฑ„๋„์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ฐ๊ฐ์˜ ๊ฒฝ์šฐ์— ๋Œ€ํ•˜์—ฌ ๋ผ์šฐํŒ… ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•˜๋‚˜์”ฉ ์ œ์‹œํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๊ฒฝ์šฐ์— ์žˆ์–ด์„œ๋Š” deflection ๋ผ์šฐํŒ… ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธต๊ฐ„ ํ†ต์‹ ์˜ ๊ธด ์ง€์—ฐ ์‹œ๊ฐ„์„ ๊ทน๋ณตํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ธต๊ฐ„ ํ†ต์‹ ์„ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•จ์œผ๋กœ์จ, ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐœ์„ ๋œ ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋ณด์ด๋ฉฐ ๋ผ์šฐํ„ฐ ๋ฒ„ํผ์˜ ์ œ๊ฑฐ๋ฅผ ํ†ตํ•œ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ ๋˜ํ•œ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๊ฒฝ์šฐ์—์„œ๋Š” ์ธต๊ฐ„ ํ†ต์‹  ์ฑ„๋„์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ๊ทœ์น™์„ ์ œ์‹œํ•œ๋‹ค. ์•ฝ๊ฐ„์˜ ๋ผ์šฐํŒ… ์ž์œ ๋„๋ฅผ ํฌ์ƒํ•จ์œผ๋กœ์จ, ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ€์ƒ ์ฑ„๋„ ์š”๊ตฌ ์กฐ๊ฑด์„ ์ œ๊ฑฐํ•˜๊ณ , ๊ฒฐ๊ณผ์ ์œผ๋กœ๋Š” ์„ฑ๋Šฅ ๋˜๋Š” ์—๋„ˆ์ง€ ํšจ์œจ์˜ ์ฆ๊ฐ€๋ฅผ ๊ฐ€์ ธ ์˜จ๋‹ค.For decades, advance in semiconductor technology has led us to the era of many-core systems. Today's desktop computers already have multi-core processors, and chips with more than a hundred cores are commercially available. As a communication medium for such a large number of cores, network-on-chip (NoC) has emerged out, and now is being used by many researchers and companies. Adopting NoC for a many-core system incurs many problems, and this thesis tries to solve some of them. The second chapter of this thesis is on mapping and scheduling of tasks on NoC-based CMP architectures. Although mapping on NoC has a number of papers published, our work reveals that selecting communication types between shared memory and message passing can help improve the performance and energy efficiency. Additionally, our framework supports scheduling applications containing backward dependencies with the help of modified modulo scheduling. Evolving the SoCs through 3D stacking makes us face a number of new problems, and the thermal problem coming from increased power density is one of them. In the third chapter of this thesis, we try to mitigate the hotspot problem using DVFS techniques. Assuming that all the routers as well as cores have capabilities to control voltage and frequency individually, we find voltage-frequency pairs for all cores and routers which yields the best performance within the given thermal constraint. The fourth and the fifth chapters of this thesis are from a different aspect. In 3D stacking, inter-layer interconnections are implemented using through-silicon vias (TSV). TSVs usually take much more area than normal wires. Furthermore, they also consume silicon area as well as metal area. For this reason, designers would want to limit the number of TSVs used in their network. To limit the TSV count, there are two options: the first is to reduce the width of each vertical links, and the other is to use fewer vertical links, which results in a partially connected network. We present two routing methodologies for each case. For the network with reduced bandwidth vertical links, we propose using deflection routing to mitigate the long latency of vertical links. By balancing the vertical traffics properly, the algorithm provides improved latency. Also, a large amount of area and energy reduction can be obtained by the removal of router buffers. For partially connected networks, we introduce a set of routing rules for selecting the vertical links. At the expense of sacrificing some amount of routing freedom, the proposed algorithm removes the virtual channel requirement for avoiding deadlock. As a result, the performance, or energy consumption can be reduced at the designer's choice.Chapter 1 Introduction 1 1.1 Task Mapping and Scheduling 2 1.2 Thermal Management 3 1.3 Routing for 3D Networks 5 Chapter 2 Mapping and Scheduling 9 2.1 Introduction 9 2.2 Motivation 10 2.3 Background 12 2.4 Related Work 16 2.5 Platform Description 17 2.5.1 Architcture Description 17 2.5.2 Energy Model 21 2.5.3 Communication Delay Model 22 2.6 Problem Formulation 23 2.7 Proposed Solution 25 2.7.1 Task and Communication Mapping 27 2.7.2 Communication Type Optimization 31 2.7.3 Design Space Pruning via Pre-evaluation 34 2.7.4 Scheduling 35 2.8 Experimental Results 42 2.8.1 Experiments with Coarse-grained Iterative Modulo Scheduling 42 2.8.2 Comparison with Different Mapping Algorithms 43 2.8.3 Experiments with Overall Algorithms 45 2.8.4 Experiments with Various Local Memory Sizes 47 2.8.5 Experiments with Various Placements of Shared Memory 48 Chapter 3 Thermal Management 50 3.1 Introduction 50 3.2 Background 51 3.2.1 Thermal Modeling 51 3.2.2 Heterogeneity in Thermal Propagation 52 3.3 Motivation and Problem Definition 53 3.4 Related Work 56 3.5 Orchestrated Voltage-Frequency Assignment 56 3.5.1 Individual PI Control Method 56 3.5.2 PI Controlled Weighted-Power Budgeting 57 3.5.3 Performance/Power Estimation 59 3.5.4 Frequency Assignment 62 3.5.5 Algorithm Overview 64 3.5.6 Stability Conditions for PI Controller 65 3.6 Experimental Result 66 3.6.1 Experimental Setup 66 3.6.2 Overall Algorithm Performance 68 3.6.3 Accuracy of the Estimation Model 70 3.6.4 Performance of the Frequency Assignment Algorithm 70 Chapter 4 Routing for Limited Bandwidth 3D NoC 72 4.1 Introduction 72 4.2 Motivation 73 4.3 Background 74 4.4 Related Work 75 4.5 3D Deflection Routing 76 4.5.1 Serialized TSV Model 76 4.5.2 TSV Link Injection/ejection Scheme 78 4.5.3 Deadlock Avoidance 80 4.5.4 Livelock Avoidance 84 4.5.5 Router Architecture: Putting It All Together 86 4.5.6 System Level Consideration 87 4.6 Experimental Results 89 4.6.1 Experimental Setup 89 4.6.2 Results on Synthetic Traffic Patterns 91 4.6.3 Results on Realistic Traffic Patterns 94 4.6.4 Results on Real Application Benchmarks 98 4.6.5 Fairness Issue 103 4.6.6 Area Cost Comparison 104 Chapter 5 Routing for Partially Connected 3D NoC 106 5.1 Introduction 106 5.2 Background 107 5.3 Related Work 109 5.4 Proposed Algorithm 111 5.4.1 Preliminary 112 5.4.2 Routing Algorithm for 3-D Stacked Meshes with Regular Partial Vertical Connections 115 5.4.3 Routing Algorithm for 3-D Stacked Meshes with Irregular Partial Vertical Connections 118 5.4.4 Extension to Heterogeneous Mesh Layers 122 5.5 Experimental Results 126 5.5.1 Experimental Setup 126 5.5.2 Experiments on Synthetic Traffics 128 5.5.3 Experiments on Application Benchmarks 133 5.5.4 Comparison with Reduced Bandwidth Mesh 139 Chapter 6 Conclusion 141 Bibliography 144 ์ดˆ๋ก 163Docto

    Software-based and regionally-oriented traffic management in Networks-on-Chip

    Get PDF
    Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements. The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures

    Energy-aware synthesis for networks on chip architectures

    Full text link
    The Network on Chip (NoC) paradigm was introduced as a scalable communication infrastructure for future System-on-Chip applications. Designing application specific customized communication architectures is critical for obtaining low power, high performance solutions. Two significant design automation problems are the creation of an optimized configuration, given application requirement the implementation of this on-chip network. Automating the design of on-chip networks requires models for estimating area and energy, algorithms to effectively explore the design space and network component libraries and tools to generate the hardware description. Chip architects are faced with managing a wide range of customization options for individual components, routers and topology. As energy is of paramount importance, the effectiveness of any custom NoC generation approach lies in the availability of good energy models to effectively explore the design space. This thesis describes a complete NoC synthesis ๏ฌ‚ow, called NoCGEN, for creating energy-efficient custom NoC architectures. Three major automation problems are addressed: custom topology generation, energy modeling and generation. An iterative algorithm is proposed to generate application specific point-to-point and packet-switched networks. The algorithm explores the design space for efficient topologies using characterized models and a system-level ๏ฌ‚oorplanner for evaluating placement and wire-energy. Prior to our contribution, building an energy model required careful analysis of transistor or gate implementations. To alleviate the burden, an automated linear regression-based methodology is proposed to rapidly extract energy models for many router designs. The resulting models are cycle accurate with low-complexity and found to be within 10% of gate-level energy simulations, and execute several orders of magnitude faster than gate-level simulations. A hardware description of the custom topology is generated using a parameterizable library and custom HDL generator. Fully reusable and scalable network components (switches, crossbars, arbiters, routing algorithms) are described using a template approach and are used to compose arbitrary topologies. A methodology for building and composing routers and topologies using a template engine is described. The entire flow is implemented as several demonstrable extensible tools with powerful visualization functionality. Several experiments are performed to demonstrate the design space exploration capabilities and compare it against a competing min-cut topology generation algorithm
    corecore