53 research outputs found

    230104

    Get PDF
    In deflection-based Network-on-Chips (NoC), when several flits entering a router contend for the same output port, one of the flits is routed to the desired output and the others are deflected to alternatives outputs. The approach reduces power consumption and silicon footprint in comparison to virtual channels (VCs) based solutions. However, due to the nondeterministic number of deflections that flits may suffer while traversing the network, flits may be received in an out-of-order fashion at their destinations. In this work, we present IPDeN, a novel deflectionbased NoC that ensures in-order flit delivery. To avoid the use of costly reordering mechanisms at the destination of each communication flow, we propose a solution based on a single small buffer added to each router to prevents flits from over taking other flits belonging to the same communication flow. We also develop a worst-case traversal time (WCTT) analysis for packets transmitted over IPDeN. We implemented IPDeN in Verilog and synthesized it for an FPGA platform. We show that a router of IPDeN requires "483-times less hardware resources than routers that use VCs. Experimental results shown that the worst-case and average packets communication time is reduced in comparison to the state-of-the-artThis work was partially supported by National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology), within the CISTER Research Unit (UIDP/UIDB/04234/2020); by FCT and the ESF (European Social Fund) through the Regional Operational Programme (ROP) Norte 2020, under PhD grant 2020.06898.BD.info:eu-repo/semantics/publishedVersio

    IN-CHANNEL MISROUTING SUPPRESSION TECHNIQUE FOR DEFLECTION-ROUTED NETWORKS ON CHIP

    Get PDF
    Deflection routing, where port-contentions in routers are resolved by intentionally misrouting some of packets along unwanted directions instead of storing them, has been proposed as a promising approach for improving power and area efficiency of large-scale networks on chip (NoCs). However, at high network load, when packets are misrouted more frequently, the cost and energy benefits of this simple routing scheme are offset by the performance degradation. To address this problem, we propose a technique that uses small in-channel buffers to capture some of deflected packets before they take a misrouting hop. The captured packets are then looped-back to the routers where they suffered deflection and routed again. To improve the efficiency of this in-channel misrouting suppression scheme we also slightly modify the routing function of the deflection router by restricting the choice of productive directions for misrouted packets. Evaluations on synthetic traffic patterns show that the proposed misrouting suppression mechanism yields an improvement of 36.2% in network saturation throughput when implemented into the conventional deflection-routed network

    Poboljลกanje performansi mreลพa na ฤipu zasnovanih na deflekcionom rutiranju

    Get PDF
    This doctoral dissertation comprises performance enhacement solutions for deflection-routed networks-on-chip. Presented solutions include the techniques for deflection minimization and techniques for misrouting supression. Two solutions presented: distributed and global port allocation (SMD and DMD). Both solutions reduce deflection rate by replacing existing algorithm of deflection router commutation stage with the novel algorithm that leads to better output ports allocation. While SMD minimizes deflection rate by choosing configuration that is beneficial for the flits at the single arbiter level, DMD introduces global port allocation in order to minimize the number of deflected flits at the output ports. Solutions for misrouting suppression presented in this doctoral dissertation are classified into solutions implemented on the inter-router link and solutions implemented in the router. There are presented two solutions that are implemented on the link: reflective link (LB) and reflective link with buffers (ILB). The essence of the LB solution is to include the option for returning the deflected flit back to the input of the router where it was deflected, which gives the flit a new opportunity for contending for productive port in the next network cycle. ILB solution additionally incorporates FIFO baffers on the links, that gives an additional flexibility compared to LB, and allows deflected flits to be kept in a buffer before returning back to the router where it was deflected. Also, ILB allows one of multiple link configurations in order to reduce misrouting. Both solutions are suitable for hardware implementation, and can be applied in any deflection network without modifying the internal router architecture. This doctoral dissertation also presents a solution for misrouting suppression in minimally-buffered deflection routers (SB_O). This solution includes modification of both the router architecture and the algorithm for SB buffer allocation. Router architecture is modified by moving the Buffer Inject stage to the front of PAS stage, in order to give higher injection priority to the flits originating from IP core, thus improving traffic distribution within the network. The SB_O also involves a novel algorithm for SB buffer allocation that selects for buffering a flit deflected on a port that is productive for flit already buffered in SB. Beside solutions for improving performance of deflection networks, doctoral dissertation presents a livelock detection and resolution mechanism. In difference to the existing livelock prevention schemes, the proposed mechanism can be easily adapted to different router architectures, and provides smaller latention of livelock detection compared to existing solutions. For the purpose of evaluating the proposed solutions, a dedicated cycleaccurate simulator of deflection networks has been developed and presented in doctoral dissertation. The simulator is implemented using language for digital systems modeling and verification โ€“ SystemC. The simulator allows functional modeling of deflection router, communication link, network topology, and network traffic. Beside simulations for performance comparing of presented solutions and reference routers, a separate set of simulations is performed in order to analyse influence of implemented performance enhancement mechanisms on distribution of network traffic

    HopliteBuf FPGA Network-on-Chip: Architecture and Analysis

    Get PDF
    We can prove occupancy bounds of stall-free FIFOs used in deflection-free, low-cost, and high-speed FPGA overlay Network-on-chips (NoCs). In our work, we build on top of the HopliteRT livelock-free overlay NoC with an FPGA-friendly 2D unidirectional torus topology to propose the novel HopliteBuf NoC. In our new NoC, we strategically introduce stall-free FIFOs in the network and support these FIFOs with static analysis based on network calculus to compute FIFO occupancy, latency, and bandwidth bounds. The microarchitecture of HopliteBuf combines the performance benefits of conventional buffered NoCs (high throughput, low latency) with the cost advantages of deflection-routed NoCs (low FPGA area, high clock frequencies). Specifically, we look at two design variants of the HopliteBuf NoC: (1) Single corner-turn FIFO (W to S), and (2) Dual corner-turn FIFO (W to S+N). The single corner-turn (W to S) design is simpler and only introduces a buffering requirement for packets changing dimension from X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W to S) as well as uphill (W to N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of small increase in resource cost. Essentially, we resolve an analysis challenge with extra hardware resources. Across a range of 100 synthetically-generated workloads on a 5 x 5 NoC, HopliteBuf outperforms HopliteRT by 1.2-2x in terms of latency, 10% in terms of injection rate, and 30-60% in terms of flowset feasibiliy. These advantages come at the cost of 3-4x higher FPGA resource requirement for buffers and muxes. Our analysis also deliver latency bounds that are not only better than HopliteRT in absolute terms but also tighter by 2-3x allowing us to provision less hardware to meet our specifications

    Worst Case Latency Analysis for Hoplite FPGA-based NoC

    Get PDF
    Overlay NoCs, such as Hoplite, are cheap to implement on an FPGA but provide no bounds on worst-case routing latency of packets traversing the NoC due to deflection routing. In this paper, we show how to adapt Hoplite to enable calculation of precise upper bounds on routing latency by modifying the routing function to prioritize deflections, and by regulating the injection of packets to meet certain throughput and burstiness constraints. We provide an analytical model for computing end-to-end latency in the form of (1) in-flight time in the network TfT^f, and (2) waiting time at the source node TsT^s. To bound in-flight time in an mร—mm \times m NoC, we modify the routing function and switching crossbar richness in the Hoplite router to deliver Tf=ฮ”X+ฮ”Y+(ฮ”Yร—m)+2T^{f} =\Delta X + \Delta Y + (\Delta Y \times m) + 2 where ฮ”X\Delta X and ฮ”Y\Delta Y are differences of the source and destination address co-ordinates of the packet. To bound the waiting time at the source, we add a Token Bucket regulator with rate ฯi\rho_i and burstiness ฯƒi\sigma_i for each flow fif_inode (x,y)(x,y) to deliver (โŒˆ1ฯiโŒ‰โˆ’1)+Ts(\lceil\frac{1}{\rho_{_i}}\rceil -1 ) + T^s : T^s =\lceil\frac{\sigma(\Gamma^C_f){1-\rho(\Gamma^C_f)} \rceil which depends on the regulator period 1/ฯi1/\rho_i, burstiness ฯƒ\sigma and the rate ฯ\rho of all interfering flows ฮ“fC\Gamma^C_f. A 64b implementation of our HopliteRT routerrequires โ‰ˆ\approx4\% fewer LUTs, and similar number of FFs compared to the original Hoplite router. We also need two small counters at each client port for regulating injection. We evaluate our model and RTL implementation across synthetic traffic patterns and observe behavior that conforms with the analytical bounds

    ์˜จ ์นฉ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„: ๋งคํ•‘, ๊ด€๋ฆฌ, ๋ผ์šฐํŒ…

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2016. 2. ์ตœ๊ธฐ์˜.์ง€๋‚œ ์ˆ˜์‹ญ ๋…„๊ฐ„ ์ด์–ด์ง„ ๋ฐ˜๋„์ฒด ๊ธฐ์ˆ ์˜ ํ–ฅ์ƒ์€ ๋งค๋‹ˆ ์ฝ”์–ด์˜ ์‹œ๋Œ€๋ฅผ ๊ฐ€์ ธ๋‹ค ์ฃผ์—ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ผ์ƒ ์ƒํ™œ์— ์“ฐ๋Š” ๋ฐ์Šคํฌํ†ฑ ์ปดํ“จํ„ฐ์กฐ์ฐจ๋„ ์ด๋ฏธ ์ˆ˜ ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ€์ง„ ์นฉ๋„ ์ƒ์šฉํ™”๋˜์–ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งŽ์€ ์ฝ”์–ด๋“ค ๊ฐ„์˜ ํ†ต์‹  ๊ธฐ๋ฐ˜์œผ๋กœ์„œ, ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ(NoC)์ด ์ƒˆ๋กœ์ด ๋Œ€๋‘๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Š” ํ˜„์žฌ ๋งŽ์€ ์—ฐ๊ตฌ ๋ฐ ์ƒ์šฉ ์ œํ’ˆ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ์„ ๋งค๋‹ˆ ์ฝ”์–ด ์‹œ์Šคํ…œ์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ๋”ฐ๋ฅด๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ ์ค‘ ๋ช‡ ๊ฐ€์ง€๋ฅผ ํ’€์–ด๋‚ด๊ณ ์ž ํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋‘ ๋ฒˆ์งธ ์ฑ•ํ„ฐ์—์„œ๋Š” NoC ๊ธฐ๋ฐ˜ ๋งค๋‹ˆ์ฝ”์–ด ๊ตฌ์กฐ์— ์ž‘์—…์„ ํ• ๋‹นํ•˜๊ณ  ์Šค์ผ€์ฅดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ์—ˆ๋‹ค. ๋งค๋‹ˆ์ฝ”์–ด์—์˜ ์ž‘์—… ํ• ๋‹น์„ ๋‹ค๋ฃฌ ๋…ผ๋ฌธ์€ ์ด๋ฏธ ๋งŽ์ด ์ถœํŒ๋˜์—ˆ์ง€๋งŒ, ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฉ”์‹œ์ง€ ํŒจ์‹ฑ๊ณผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ, ๋‘ ๊ฐ€์ง€์˜ ํ†ต์‹  ๋ฐฉ์‹์„ ๊ณ ๋ คํ•จ์œผ๋กœ์จ ์„ฑ๋Šฅ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๊ฐœ์„ ํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ๋Š” ์—ญ๋ฐฉํ–ฅ ์˜์กด์„ฑ์„ ๊ฐ€์ง„ ์ž‘์—… ๊ทธ๋ž˜ํ”„๋ฅผ ์Šค์ผ€์ฅดํ•˜๋Š” ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์‹œํ•˜์˜€๋‹ค. 3์ฐจ์› ์ ์ธต ๊ธฐ์ˆ ์€ ๋†’์•„์ง„ ์ „๋ ฅ ๋ฐ€๋„ ๋•Œ๋ฌธ์— ์—ด ๋ฌธ์ œ๊ฐ€ ์‹ฌ๊ฐํ•ด์ง€๋Š” ๋“ฑ, ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋„์ „ ๊ณผ์ œ๋ฅผ ๋‚ดํฌํ•˜๊ณ  ์žˆ๋‹ค. ์„ธ ๋ฒˆ์งธ ์ฑ•ํ„ฐ์—์„œ๋Š” DVFS ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์—ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ธฐ์ˆ ์„ ์†Œ๊ฐœํ•œ๋‹ค. ๊ฐ ์ฝ”์–ด์™€ ๋ผ์šฐํ„ฐ๊ฐ€ ์ „์••, ์ž‘๋™ ์†๋„๋ฅผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ์—์„œ, ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ์ด๋Œ์–ด ๋‚ด๋ฉด์„œ๋„ ์ตœ๋Œ€ ์˜จ๋„๋ฅผ ๋„˜์–ด์„œ์ง€ ์•Š๋„๋ก ํ•œ๋‹ค. ์„ธ ๋ฒˆ์งธ์™€ ๋„ค ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅธ ์ธก๋ฉด์„ ๋‹ค๋ฃฌ๋‹ค. 3D ์ ์ธต ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•  ๋•Œ, ์ธต๊ฐ„ ํ†ต์‹ ์€ ์ฃผ๋กœ TSV๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ฃจ์–ด์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ TSV๋Š” ์ผ๋ฐ˜ wire๋ณด๋‹ค ํ›จ์”ฌ ํฐ ๋ฉด์ ์„ ์ฐจ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ „์ฒด ๋„คํŠธ์›Œํฌ์—์„œ์˜ TSV ๊ฐœ์ˆ˜๋Š” ์ œํ•œ๋˜์–ด์•ผ ํ•  ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋‘ ๊ฐ€์ง€ ์„ ํƒ์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ์ฒซ์งธ๋Š” ๊ฐ ์ธต๊ฐ„ ํ†ต์‹  ์ฑ„๋„์˜ ๋Œ€์—ญํญ์„ ์ค„์ด๋Š” ๊ฒƒ์ด๊ณ , ๋‘˜์งธ๋Š” ๊ฐ ์ฑ„๋„์˜ ๋Œ€์—ญํญ์€ ์œ ์ง€ํ•˜๋˜ ์ผ๋ถ€ ๋…ธ๋“œ๋งŒ ์ธต๊ฐ„ ํ†ต์‹ ์ด ๊ฐ€๋Šฅํ•œ ์ฑ„๋„์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ฐ๊ฐ์˜ ๊ฒฝ์šฐ์— ๋Œ€ํ•˜์—ฌ ๋ผ์šฐํŒ… ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•˜๋‚˜์”ฉ ์ œ์‹œํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๊ฒฝ์šฐ์— ์žˆ์–ด์„œ๋Š” deflection ๋ผ์šฐํŒ… ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธต๊ฐ„ ํ†ต์‹ ์˜ ๊ธด ์ง€์—ฐ ์‹œ๊ฐ„์„ ๊ทน๋ณตํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ธต๊ฐ„ ํ†ต์‹ ์„ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•จ์œผ๋กœ์จ, ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ฐœ์„ ๋œ ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋ณด์ด๋ฉฐ ๋ผ์šฐํ„ฐ ๋ฒ„ํผ์˜ ์ œ๊ฑฐ๋ฅผ ํ†ตํ•œ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ ๋˜ํ•œ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๊ฒฝ์šฐ์—์„œ๋Š” ์ธต๊ฐ„ ํ†ต์‹  ์ฑ„๋„์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ๊ทœ์น™์„ ์ œ์‹œํ•œ๋‹ค. ์•ฝ๊ฐ„์˜ ๋ผ์šฐํŒ… ์ž์œ ๋„๋ฅผ ํฌ์ƒํ•จ์œผ๋กœ์จ, ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ€์ƒ ์ฑ„๋„ ์š”๊ตฌ ์กฐ๊ฑด์„ ์ œ๊ฑฐํ•˜๊ณ , ๊ฒฐ๊ณผ์ ์œผ๋กœ๋Š” ์„ฑ๋Šฅ ๋˜๋Š” ์—๋„ˆ์ง€ ํšจ์œจ์˜ ์ฆ๊ฐ€๋ฅผ ๊ฐ€์ ธ ์˜จ๋‹ค.For decades, advance in semiconductor technology has led us to the era of many-core systems. Today's desktop computers already have multi-core processors, and chips with more than a hundred cores are commercially available. As a communication medium for such a large number of cores, network-on-chip (NoC) has emerged out, and now is being used by many researchers and companies. Adopting NoC for a many-core system incurs many problems, and this thesis tries to solve some of them. The second chapter of this thesis is on mapping and scheduling of tasks on NoC-based CMP architectures. Although mapping on NoC has a number of papers published, our work reveals that selecting communication types between shared memory and message passing can help improve the performance and energy efficiency. Additionally, our framework supports scheduling applications containing backward dependencies with the help of modified modulo scheduling. Evolving the SoCs through 3D stacking makes us face a number of new problems, and the thermal problem coming from increased power density is one of them. In the third chapter of this thesis, we try to mitigate the hotspot problem using DVFS techniques. Assuming that all the routers as well as cores have capabilities to control voltage and frequency individually, we find voltage-frequency pairs for all cores and routers which yields the best performance within the given thermal constraint. The fourth and the fifth chapters of this thesis are from a different aspect. In 3D stacking, inter-layer interconnections are implemented using through-silicon vias (TSV). TSVs usually take much more area than normal wires. Furthermore, they also consume silicon area as well as metal area. For this reason, designers would want to limit the number of TSVs used in their network. To limit the TSV count, there are two options: the first is to reduce the width of each vertical links, and the other is to use fewer vertical links, which results in a partially connected network. We present two routing methodologies for each case. For the network with reduced bandwidth vertical links, we propose using deflection routing to mitigate the long latency of vertical links. By balancing the vertical traffics properly, the algorithm provides improved latency. Also, a large amount of area and energy reduction can be obtained by the removal of router buffers. For partially connected networks, we introduce a set of routing rules for selecting the vertical links. At the expense of sacrificing some amount of routing freedom, the proposed algorithm removes the virtual channel requirement for avoiding deadlock. As a result, the performance, or energy consumption can be reduced at the designer's choice.Chapter 1 Introduction 1 1.1 Task Mapping and Scheduling 2 1.2 Thermal Management 3 1.3 Routing for 3D Networks 5 Chapter 2 Mapping and Scheduling 9 2.1 Introduction 9 2.2 Motivation 10 2.3 Background 12 2.4 Related Work 16 2.5 Platform Description 17 2.5.1 Architcture Description 17 2.5.2 Energy Model 21 2.5.3 Communication Delay Model 22 2.6 Problem Formulation 23 2.7 Proposed Solution 25 2.7.1 Task and Communication Mapping 27 2.7.2 Communication Type Optimization 31 2.7.3 Design Space Pruning via Pre-evaluation 34 2.7.4 Scheduling 35 2.8 Experimental Results 42 2.8.1 Experiments with Coarse-grained Iterative Modulo Scheduling 42 2.8.2 Comparison with Different Mapping Algorithms 43 2.8.3 Experiments with Overall Algorithms 45 2.8.4 Experiments with Various Local Memory Sizes 47 2.8.5 Experiments with Various Placements of Shared Memory 48 Chapter 3 Thermal Management 50 3.1 Introduction 50 3.2 Background 51 3.2.1 Thermal Modeling 51 3.2.2 Heterogeneity in Thermal Propagation 52 3.3 Motivation and Problem Definition 53 3.4 Related Work 56 3.5 Orchestrated Voltage-Frequency Assignment 56 3.5.1 Individual PI Control Method 56 3.5.2 PI Controlled Weighted-Power Budgeting 57 3.5.3 Performance/Power Estimation 59 3.5.4 Frequency Assignment 62 3.5.5 Algorithm Overview 64 3.5.6 Stability Conditions for PI Controller 65 3.6 Experimental Result 66 3.6.1 Experimental Setup 66 3.6.2 Overall Algorithm Performance 68 3.6.3 Accuracy of the Estimation Model 70 3.6.4 Performance of the Frequency Assignment Algorithm 70 Chapter 4 Routing for Limited Bandwidth 3D NoC 72 4.1 Introduction 72 4.2 Motivation 73 4.3 Background 74 4.4 Related Work 75 4.5 3D Deflection Routing 76 4.5.1 Serialized TSV Model 76 4.5.2 TSV Link Injection/ejection Scheme 78 4.5.3 Deadlock Avoidance 80 4.5.4 Livelock Avoidance 84 4.5.5 Router Architecture: Putting It All Together 86 4.5.6 System Level Consideration 87 4.6 Experimental Results 89 4.6.1 Experimental Setup 89 4.6.2 Results on Synthetic Traffic Patterns 91 4.6.3 Results on Realistic Traffic Patterns 94 4.6.4 Results on Real Application Benchmarks 98 4.6.5 Fairness Issue 103 4.6.6 Area Cost Comparison 104 Chapter 5 Routing for Partially Connected 3D NoC 106 5.1 Introduction 106 5.2 Background 107 5.3 Related Work 109 5.4 Proposed Algorithm 111 5.4.1 Preliminary 112 5.4.2 Routing Algorithm for 3-D Stacked Meshes with Regular Partial Vertical Connections 115 5.4.3 Routing Algorithm for 3-D Stacked Meshes with Irregular Partial Vertical Connections 118 5.4.4 Extension to Heterogeneous Mesh Layers 122 5.5 Experimental Results 126 5.5.1 Experimental Setup 126 5.5.2 Experiments on Synthetic Traffics 128 5.5.3 Experiments on Application Benchmarks 133 5.5.4 Comparison with Reduced Bandwidth Mesh 139 Chapter 6 Conclusion 141 Bibliography 144 ์ดˆ๋ก 163Docto

    Run-time management of many-core SoCs: A communication-centric approach

    Get PDF
    The single core performance hit the power and complexity limits in the beginning of this century, moving the industry towards the design of multi- and many-core system-on-chips (SoCs). The on-chip communication between the cores plays a criticalrole in the performance of these SoCs, with power dissipation, communication latency, scalability to many cores, and reliability against the transistor failures as the main design challenges. Accordingly, we dedicate this thesis to the communicationcentered management of the many-core SoCs, with the goal to advance the state-ofthe-art in addressing these challenges. To this end, we contribute to on-chip communication of many-core SoCs in three main directions. First, we start with a synthesizable SoC with full system simulation. We demonstrate the importance of the networking overhead in a practical system, and propose our sophisticated network interface (NI) that offloads the work from SW to HW. Our results show around 5x and up to 50x higher network performance, compared to previous works. As the second direction of this thesis, we study the significance of run-time application mapping. We demonstrate that contiguous application mapping not only improves the network latency (by 23%) and power dissipation (by 50%), but also improves the system throughput (by 3%) and quality-of-service (QoS) of soft real-time applications (up to 100x less deadline misses). Also our hierarchical run-time application mapping provides 99.41% successful mapping when up to 8 links are broken. As the final direction of the thesis, we propose a fault-tolerant routing algorithm, the maze-routing. It is the first-in-class algorithm that provides guaranteed delivery, a fully-distributed solution, low area overhead (by 16x), and instantaneous reconfiguration (vs. 40K cycles down time of previous works), all at the same time. Besides the individual goals of each contribution, when applicable, we ensure that our solutions scale to extreme network sizes like 12x12 and 16x16. This thesis concludes that the communication overhead and its optimization play a significant role in the performance of many-core SoC

    Optical packet switching using multi-wavelength labels

    Get PDF

    MOCAST 2021

    Get PDF
    The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications
    • โ€ฆ
    corecore