66 research outputs found

    230104

    Get PDF
    In deflection-based Network-on-Chips (NoC), when several flits entering a router contend for the same output port, one of the flits is routed to the desired output and the others are deflected to alternatives outputs. The approach reduces power consumption and silicon footprint in comparison to virtual channels (VCs) based solutions. However, due to the nondeterministic number of deflections that flits may suffer while traversing the network, flits may be received in an out-of-order fashion at their destinations. In this work, we present IPDeN, a novel deflectionbased NoC that ensures in-order flit delivery. To avoid the use of costly reordering mechanisms at the destination of each communication flow, we propose a solution based on a single small buffer added to each router to prevents flits from over taking other flits belonging to the same communication flow. We also develop a worst-case traversal time (WCTT) analysis for packets transmitted over IPDeN. We implemented IPDeN in Verilog and synthesized it for an FPGA platform. We show that a router of IPDeN requires "483-times less hardware resources than routers that use VCs. Experimental results shown that the worst-case and average packets communication time is reduced in comparison to the state-of-the-artThis work was partially supported by National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology), within the CISTER Research Unit (UIDP/UIDB/04234/2020); by FCT and the ESF (European Social Fund) through the Regional Operational Programme (ROP) Norte 2020, under PhD grant 2020.06898.BD.info:eu-repo/semantics/publishedVersio

    ์˜จ ์นฉ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„: ๋งคํ•‘, ๊ด€๋ฆฌ, ๋ผ์šฐํŒ…

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2016. 2. ์ตœ๊ธฐ์˜.์ง€๋‚œ ์ˆ˜์‹ญ ๋…„๊ฐ„ ์ด์–ด์ง„ ๋ฐ˜๋„์ฒด ๊ธฐ์ˆ ์˜ ํ–ฅ์ƒ์€ ๋งค๋‹ˆ ์ฝ”์–ด์˜ ์‹œ๋Œ€๋ฅผ ๊ฐ€์ ธ๋‹ค ์ฃผ์—ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ผ์ƒ ์ƒํ™œ์— ์“ฐ๋Š” ๋ฐ์Šคํฌํ†ฑ ์ปดํ“จํ„ฐ์กฐ์ฐจ๋„ ์ด๋ฏธ ์ˆ˜ ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ€์ง„ ์นฉ๋„ ์ƒ์šฉํ™”๋˜์–ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งŽ์€ ์ฝ”์–ด๋“ค ๊ฐ„์˜ ํ†ต์‹  ๊ธฐ๋ฐ˜์œผ๋กœ์„œ, ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ(NoC)์ด ์ƒˆ๋กœ์ด ๋Œ€๋‘๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Š” ํ˜„์žฌ ๋งŽ์€ ์—ฐ๊ตฌ ๋ฐ ์ƒ์šฉ ์ œํ’ˆ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ์„ ๋งค๋‹ˆ ์ฝ”์–ด ์‹œ์Šคํ…œ์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ๋”ฐ๋ฅด๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ ์ค‘ ๋ช‡ ๊ฐ€์ง€๋ฅผ ํ’€์–ด๋‚ด๊ณ ์ž ํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋‘ ๋ฒˆ์งธ ์ฑ•ํ„ฐ์—์„œ๋Š” NoC ๊ธฐ๋ฐ˜ ๋งค๋‹ˆ์ฝ”์–ด ๊ตฌ์กฐ์— ์ž‘์—…์„ ํ• ๋‹นํ•˜๊ณ  ์Šค์ผ€์ฅดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ์—ˆ๋‹ค. ๋งค๋‹ˆ์ฝ”์–ด์—์˜ ์ž‘์—… ํ• ๋‹น์„ ๋‹ค๋ฃฌ ๋…ผ๋ฌธ์€ ์ด๋ฏธ ๋งŽ์ด ์ถœํŒ๋˜์—ˆ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฉ”์‹œ์ง€ ํŒจ์‹ฑ๊ณผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ, ๋‘ ๊ฐ€์ง€์˜ ํ†ต์‹  ๋ฐฉ์‹์„ ๊ณ ๋ คํ•จ์œผ๋กœ์จ ์„ฑ๋Šฅ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๊ฐœ์„ ํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ๋Š” ์—ญ๋ฐฉํ–ฅ ์˜์กด์„ฑ์„ ๊ฐ€์ง„ ์ž‘์—… ๊ทธ๋ž˜ํ”„๋ฅผ ์Šค์ผ€์ฅดํ•˜๋Š” ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์‹œํ•˜์˜€๋‹ค. 3์ฐจ์› ์ ์ธต ๊ธฐ์ˆ ์€ ๋†’์•„์ง„ ์ „๋ ฅ ๋ฐ€๋„ ๋•Œ๋ฌธ์— ์—ด ๋ฌธ์ œ๊ฐ€ ์‹ฌ๊ฐํ•ด์ง€๋Š” ๋“ฑ, ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋„์ „ ๊ณผ์ œ๋ฅผ ๋‚ดํฌํ•˜๊ณ  ์žˆ๋‹ค. ์„ธ ๋ฒˆ์งธ ์ฑ•ํ„ฐ์—์„œ๋Š” DVFS ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์—ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ธฐ์ˆ ์„ ์†Œ๊ฐœํ•œ๋‹ค. ๊ฐ ์ฝ”์–ด์™€ ๋ผ์šฐํ„ฐ๊ฐ€ ์ „์••, ์ž‘๋™ ์†๋„๋ฅผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ์—์„œ, ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ์ด๋Œ์–ด ๋‚ด๋ฉด์„œ๋„ ์ตœ๋Œ€ ์˜จ๋„๋ฅผ ๋„˜์–ด์„œ์ง€ ์•Š๋„๋ก ํ•œ๋‹ค. ์„ธ ๋ฒˆ์งธ์™€ ๋„ค ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅธ ์ธก๋ฉด์„ ๋‹ค๋ฃฌ๋‹ค. 3D ์ ์ธต ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•  ๋•Œ, ์ธต๊ฐ„ ํ†ต์‹ ์€ ์ฃผ๋กœ TSV๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ฃจ์–ด์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ TSV๋Š” ์ผ๋ฐ˜ wire๋ณด๋‹ค ํ›จ์”ฌ ํฐ ๋ฉด์ ์„ ์ฐจ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ „์ฒด ๋„คํŠธ์›Œํฌ์—์„œ์˜ TSV ๊ฐœ์ˆ˜๋Š” ์ œํ•œ๋˜์–ด์•ผ ํ•  ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋‘ ๊ฐ€์ง€ ์„ ํƒ์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ์ฒซ์งธ๋Š” ๊ฐ ์ธต๊ฐ„ ํ†ต์‹  ์ฑ„๋„์˜ ๋Œ€์—ญํญ์„ ์ค„์ด๋Š” ๊ฒƒ์ด๊ณ , ๋‘˜์งธ๋Š” ๊ฐ ์ฑ„๋„์˜ ๋Œ€์—ญํญ์€ ์œ ์ง€ํ•˜๋˜ ์ผ๋ถ€ ๋…ธ๋“œ๋งŒ ์ธต๊ฐ„ ํ†ต์‹ ์ด ๊ฐ€๋Šฅํ•œ ์ฑ„๋„์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ฐ๊ฐ์˜ ๊ฒฝ์šฐ์— ๋Œ€ํ•˜์—ฌ ๋ผ์šฐํŒ… ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•˜๋‚˜์”ฉ ์ œ์‹œํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๊ฒฝ์šฐ์— ์žˆ์–ด์„œ๋Š” deflection ๋ผ์šฐํŒ… ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธต๊ฐ„ ํ†ต์‹ ์˜ ๊ธด ์ง€์—ฐ ์‹œ๊ฐ„์„ ๊ทน๋ณตํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ธต๊ฐ„ ํ†ต์‹ ์„ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•จ์œผ๋กœ์จ, ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐœ์„ ๋œ ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋ณด์ด๋ฉฐ ๋ผ์šฐํ„ฐ ๋ฒ„ํผ์˜ ์ œ๊ฑฐ๋ฅผ ํ†ตํ•œ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ ๋˜ํ•œ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๊ฒฝ์šฐ์—์„œ๋Š” ์ธต๊ฐ„ ํ†ต์‹  ์ฑ„๋„์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ๊ทœ์น™์„ ์ œ์‹œํ•œ๋‹ค. ์•ฝ๊ฐ„์˜ ๋ผ์šฐํŒ… ์ž์œ ๋„๋ฅผ ํฌ์ƒํ•จ์œผ๋กœ์จ, ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ€์ƒ ์ฑ„๋„ ์š”๊ตฌ ์กฐ๊ฑด์„ ์ œ๊ฑฐํ•˜๊ณ , ๊ฒฐ๊ณผ์ ์œผ๋กœ๋Š” ์„ฑ๋Šฅ ๋˜๋Š” ์—๋„ˆ์ง€ ํšจ์œจ์˜ ์ฆ๊ฐ€๋ฅผ ๊ฐ€์ ธ ์˜จ๋‹ค.For decades, advance in semiconductor technology has led us to the era of many-core systems. Today's desktop computers already have multi-core processors, and chips with more than a hundred cores are commercially available. As a communication medium for such a large number of cores, network-on-chip (NoC) has emerged out, and now is being used by many researchers and companies. Adopting NoC for a many-core system incurs many problems, and this thesis tries to solve some of them. The second chapter of this thesis is on mapping and scheduling of tasks on NoC-based CMP architectures. Although mapping on NoC has a number of papers published, our work reveals that selecting communication types between shared memory and message passing can help improve the performance and energy efficiency. Additionally, our framework supports scheduling applications containing backward dependencies with the help of modified modulo scheduling. Evolving the SoCs through 3D stacking makes us face a number of new problems, and the thermal problem coming from increased power density is one of them. In the third chapter of this thesis, we try to mitigate the hotspot problem using DVFS techniques. Assuming that all the routers as well as cores have capabilities to control voltage and frequency individually, we find voltage-frequency pairs for all cores and routers which yields the best performance within the given thermal constraint. The fourth and the fifth chapters of this thesis are from a different aspect. In 3D stacking, inter-layer interconnections are implemented using through-silicon vias (TSV). TSVs usually take much more area than normal wires. Furthermore, they also consume silicon area as well as metal area. For this reason, designers would want to limit the number of TSVs used in their network. To limit the TSV count, there are two options: the first is to reduce the width of each vertical links, and the other is to use fewer vertical links, which results in a partially connected network. We present two routing methodologies for each case. For the network with reduced bandwidth vertical links, we propose using deflection routing to mitigate the long latency of vertical links. By balancing the vertical traffics properly, the algorithm provides improved latency. Also, a large amount of area and energy reduction can be obtained by the removal of router buffers. For partially connected networks, we introduce a set of routing rules for selecting the vertical links. At the expense of sacrificing some amount of routing freedom, the proposed algorithm removes the virtual channel requirement for avoiding deadlock. As a result, the performance, or energy consumption can be reduced at the designer's choice.Chapter 1 Introduction 1 1.1 Task Mapping and Scheduling 2 1.2 Thermal Management 3 1.3 Routing for 3D Networks 5 Chapter 2 Mapping and Scheduling 9 2.1 Introduction 9 2.2 Motivation 10 2.3 Background 12 2.4 Related Work 16 2.5 Platform Description 17 2.5.1 Architcture Description 17 2.5.2 Energy Model 21 2.5.3 Communication Delay Model 22 2.6 Problem Formulation 23 2.7 Proposed Solution 25 2.7.1 Task and Communication Mapping 27 2.7.2 Communication Type Optimization 31 2.7.3 Design Space Pruning via Pre-evaluation 34 2.7.4 Scheduling 35 2.8 Experimental Results 42 2.8.1 Experiments with Coarse-grained Iterative Modulo Scheduling 42 2.8.2 Comparison with Different Mapping Algorithms 43 2.8.3 Experiments with Overall Algorithms 45 2.8.4 Experiments with Various Local Memory Sizes 47 2.8.5 Experiments with Various Placements of Shared Memory 48 Chapter 3 Thermal Management 50 3.1 Introduction 50 3.2 Background 51 3.2.1 Thermal Modeling 51 3.2.2 Heterogeneity in Thermal Propagation 52 3.3 Motivation and Problem Definition 53 3.4 Related Work 56 3.5 Orchestrated Voltage-Frequency Assignment 56 3.5.1 Individual PI Control Method 56 3.5.2 PI Controlled Weighted-Power Budgeting 57 3.5.3 Performance/Power Estimation 59 3.5.4 Frequency Assignment 62 3.5.5 Algorithm Overview 64 3.5.6 Stability Conditions for PI Controller 65 3.6 Experimental Result 66 3.6.1 Experimental Setup 66 3.6.2 Overall Algorithm Performance 68 3.6.3 Accuracy of the Estimation Model 70 3.6.4 Performance of the Frequency Assignment Algorithm 70 Chapter 4 Routing for Limited Bandwidth 3D NoC 72 4.1 Introduction 72 4.2 Motivation 73 4.3 Background 74 4.4 Related Work 75 4.5 3D Deflection Routing 76 4.5.1 Serialized TSV Model 76 4.5.2 TSV Link Injection/ejection Scheme 78 4.5.3 Deadlock Avoidance 80 4.5.4 Livelock Avoidance 84 4.5.5 Router Architecture: Putting It All Together 86 4.5.6 System Level Consideration 87 4.6 Experimental Results 89 4.6.1 Experimental Setup 89 4.6.2 Results on Synthetic Traffic Patterns 91 4.6.3 Results on Realistic Traffic Patterns 94 4.6.4 Results on Real Application Benchmarks 98 4.6.5 Fairness Issue 103 4.6.6 Area Cost Comparison 104 Chapter 5 Routing for Partially Connected 3D NoC 106 5.1 Introduction 106 5.2 Background 107 5.3 Related Work 109 5.4 Proposed Algorithm 111 5.4.1 Preliminary 112 5.4.2 Routing Algorithm for 3-D Stacked Meshes with Regular Partial Vertical Connections 115 5.4.3 Routing Algorithm for 3-D Stacked Meshes with Irregular Partial Vertical Connections 118 5.4.4 Extension to Heterogeneous Mesh Layers 122 5.5 Experimental Results 126 5.5.1 Experimental Setup 126 5.5.2 Experiments on Synthetic Traffics 128 5.5.3 Experiments on Application Benchmarks 133 5.5.4 Comparison with Reduced Bandwidth Mesh 139 Chapter 6 Conclusion 141 Bibliography 144 ์ดˆ๋ก 163Docto

    On Fault Resilient Network-on-Chip for Many Core Systems

    Get PDF
    Rapid scaling of transistor gate sizes has increased the density of on-chip integration and paved the way for heterogeneous many-core systems-on-chip, significantly improving the speed of on-chip processing. The design of the interconnection network of these complex systems is a challenging one and the network-on-chip (NoC) is now the accepted scalable and bandwidth efficient interconnect for multi-processor systems on-chip (MPSoCs). However, the performance enhancements of technology scaling come at the cost of reliability as on-chip components particularly the network-on-chip become increasingly prone to faults. In this thesis, we focus on approaches to deal with the errors caused by such faults. The results of these approaches are obtained not only via time-consuming cycle-accurate simulations but also by analytical approaches, allowing for faster and accurate evaluations, especially for larger networks. Redundancy is the general approach to deal with faults, the mode of which varies according to the type of fault. For the NoC, there exists a classification of faults into transient, intermittent and permanent faults. Transient faults appear randomly for a few cycles and may be caused by the radiation of particles. Intermittent faults are similar to transient faults, however, differing in the fact that they occur repeatedly at the same location, eventually leading to a permanent fault. Permanent faults by definition are caused by wires and transistors being permanently short or open. Generally, spatial redundancy or the use of redundant components is used for dealing with permanent faults. Temporal redundancy deals with failures by re-execution or by retransmission of data while information redundancy adds redundant information to the data packets allowing for error detection and correction. Temporal and information redundancy methods are useful when dealing with transient and intermittent faults. In this dissertation, we begin with permanent faults in NoC in the form of faulty links and routers. Our approach for spatial redundancy adds redundant links in the diagonal direction to the standard rectangular mesh topology resulting in the hexagonal and octagonal NoCs. In addition to redundant links, adaptive routing must be used to bypass faulty components. We develop novel fault-tolerant deadlock-free adaptive routing algorithms for these topologies based on the turn model without the use of virtual channels. Our results show that the hexagonal and octagonal NoCs can tolerate all 2-router and 3-router faults, respectively, while the mesh has been shown to tolerate all 1-router faults. To simplify the restricted-turn selection process for achieving deadlock freedom, we devised an approach based on the channel dependency matrix instead of the state-of-the-art Duato's method of observing the channel dependency graph for cycles. The approach is general and can be used for the turn selection process for any regular topology. We further use algebraic manipulations of the channel dependency matrix to analytically assess the fault resilience of the adaptive routing algorithms when affected by permanent faults. We present and validate this method for the 2D mesh and hexagonal NoC topologies achieving very high accuracy with a maximum error of 1%. The approach is very general and allows for faster evaluations as compared to the generally used cycle-accurate simulations. In comparison, existing works usually assume a limited number of faults to be able to analytically assess the network reliability. We apply the approach to evaluate the fault resilience of larger NoCs demonstrating the usefulness of the approach especially compared to cycle-accurate simulations. Finally, we concentrate on temporal and information redundancy techniques to deal with transient and intermittent faults in the router resulting in the dropping and hence loss of packets. Temporal redundancy is applied in the form of ARQ and retransmission of lost packets. Information redundancy is applied by the generation and transmission of redundant linear combinations of packets known as random linear network coding. We develop an analytic model for flexible evaluation of these approaches to determine the network performance parameters such as residual error rates and increased network load. The analytic model allows to evaluate larger NoCs and different topologies and to investigate the advantage of network coding compared to uncoded transmissions. We further extend the work with a small insight to the problem of secure communication over the NoC. Assuming large heterogeneous MPSoCs with components from third parties, the communication is subject to active attacks in the form of packet modification and drops in the NoC routers. Devising approaches to resolve these issues, we again formulate analytic models for their flexible and accurate evaluations, with a maximum estimation error of 7%

    Design and implementation of NoC routers and their application to Prdt-based NoC\u27s

    Full text link
    With a communication-centric design style, Networks-on-Chips (NoCs) emerges as a new paradigm of Systems-on-Chips (SoCs) to overcome the limitations of bus-based communication infrastructure. An important problem in the design of NoCs is the router design, which has great impact on the cost and performance of a NoC system. This thesis is focused on the design and implementation of an optimized parameterized router which can be applied in mesh/torus-based and Perfect Recursive Diagonal Torus (PRDT)-based NoCs; In specific, the router design includes the design and implementation of two routing algorithms (vector routing and circular coded vector routing), the wormhole switching scheme, the scheduling scheme, buffering strategy, and flow control scheme. Correspondingly, the following components are designed and implemented: input controller, output controller, crossbar switch, and scheduler. Verilog HDL codes are generated and synthesized on ASIC platforms. Most components are designed in parameterized way. Performance evaluation of each component of the router in terms of timing, area, and power consumption is conducted. The efficiency of the two routing algorithms and tradeoff between computational time (tsetup) and area are analyzed; To reduce the area cost of the router design, the two major components, the crossbar switch and the scheduler, are optimized. Particularly, for crossbar switch, a comparative study of two crossbar designs is performed with the aid of Magic Layout editor, Synopsys CosmosSE and Awaves; Based on the router design, the PRDT network composed of 4x4 routers is designed and synthesized on ASIC platforms

    Exploring Adaptive Implementation of On-Chip Networks

    Get PDF
    As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast

    Software-based and regionally-oriented traffic management in Networks-on-Chip

    Get PDF
    Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements. The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures

    Network-on-Chip

    Get PDF
    Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoCโ€”its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    HopliteBuf FPGA Network-on-Chip: Architecture and Analysis

    Get PDF
    We can prove occupancy bounds of stall-free FIFOs used in deflection-free, low-cost, and high-speed FPGA overlay Network-on-chips (NoCs). In our work, we build on top of the HopliteRT livelock-free overlay NoC with an FPGA-friendly 2D unidirectional torus topology to propose the novel HopliteBuf NoC. In our new NoC, we strategically introduce stall-free FIFOs in the network and support these FIFOs with static analysis based on network calculus to compute FIFO occupancy, latency, and bandwidth bounds. The microarchitecture of HopliteBuf combines the performance benefits of conventional buffered NoCs (high throughput, low latency) with the cost advantages of deflection-routed NoCs (low FPGA area, high clock frequencies). Specifically, we look at two design variants of the HopliteBuf NoC: (1) Single corner-turn FIFO (W to S), and (2) Dual corner-turn FIFO (W to S+N). The single corner-turn (W to S) design is simpler and only introduces a buffering requirement for packets changing dimension from X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W to S) as well as uphill (W to N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of small increase in resource cost. Essentially, we resolve an analysis challenge with extra hardware resources. Across a range of 100 synthetically-generated workloads on a 5 x 5 NoC, HopliteBuf outperforms HopliteRT by 1.2-2x in terms of latency, 10% in terms of injection rate, and 30-60% in terms of flowset feasibiliy. These advantages come at the cost of 3-4x higher FPGA resource requirement for buffers and muxes. Our analysis also deliver latency bounds that are not only better than HopliteRT in absolute terms but also tighter by 2-3x allowing us to provision less hardware to meet our specifications
    • โ€ฆ
    corecore