60 research outputs found

    SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip Router

    Get PDF
    Switch Allocation (SA) holds a critical stage in Network-on-Chip (NoC) routers, its performance gets affected adversely due to Head-of-Line (HoL) blocking. In traditionally used Input-Queued Routers (IQR), packets are arranged in a particular order in each Virtual Channel (VC). This implementation is vulnerable to HoL blocking, as the switch allocator can allocate only those packets which are available at the head in a VC. In this paper, Swapped Buffer (SB) Router architecture is proposed to schedule packets in input buffers by using SB registers. The VCs are designed as SBs, this allows the packets stored in SB registers along with the head packet of VC to participate in SA. The concept of the SB register minimizes the conflicts in SA and thus reduces HoL blocking, therefore improves the performance of NoC. This paper proposes a priority mechanism to prioritize the non-head packets as compared to head packets in case of conflict between them. Two methods have been proposed in this paper, to enhance the performance of the NoC router. First, a VC allocation technique is proposed to optimize the order of packets in the input buffer. Next, SB-Router is combined with the Fill VC allocation technique to further enhance the performance of NoC routers. The performance of the proposed router is evaluated and the experimental results indicate that our design achieves latency improvement of 68.75% over (Time-Series) TS-Router for uniform traffic at the injection rate of 0.42 flits/cycle for a 64 node mesh network with moderate power consumption and area usage. The performance improvement in packet latency for traces from Princeton Application Repository for Shared-Memory Computers (PARSEC) has also been evaluated. With the achieved reduction in latency, the proposed method has the potential to serve high-speed operations while mapping different applications on multiple core architectures.</p

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip

    Full text link
    Tesis por compendioNowadays, thanks to the continuous improvements in the integration scale, more and more cores are added on the same chip, leading to higher system performance. In order to interconnect all nodes, a network-on-chip (NoC) is used, which is in charge of delivering data between cores. However, increasing the number of cores leads to a significant power consumption increase, leading the NoC to be one of the most expensive components in terms of power. Because of this, during the last years, several mechanisms have been proposed to address the NoC power consumption by means of DVFS (Dynamic Voltage and Frequency Scaling) and power-gating strategies. Nevertheless, improvements achieved by these mechanisms are achieved, to a greater or lesser extent, at the cost of system performance, potentially increasing the risk of saturating the network by forming congested points which, in turn, compromise the rest of the system functionality. One side effect is the creation of the "Head-of-Line blocking" effect where congested packets at the head of queues prevent other non-blocked packets from advancing. To address this issue, in this thesis, on one hand, we propose novel congestion control techniques in order to improve system performance by removing the "Head-of-Line" blocking effect. On the other hand, we propose combined solutions adapted to DVFS in order to achieve improvements in terms of performance and power. In addition to this, we propose a path-aware power-gating-based mechanism, which is capable of detecting the flows sharing buffer resources along data paths and perform to switch them off when not needed. With all these combined solutions we can significantly reduce the power consumption of the NoC when compared with state-of-the-art proposals.Hoy en día, gracias a las mejoras en la escala de integración cada vez se integran más y más núcleos en un mismo chip, mejorando así sus prestaciones. Para interconectar todos los nodos dentro del chip se emplea una red en chip (NoC, Network-on-Chip), la cual es la encargada de intercambiar información entre núcleos. No obstante, aumentar el número de núcleos en el chip también conlleva a su vez un importante incremento en el consumo de la NoC, haciendo que ésta se convierta en una de las partes más caras del chip en términos de consumo. Por ello, en los últimos años se han propuesto diversas técnicas de ahorro de energía orientadas a reducir el consumo de la NoC mediante el uso de DVFS (Dynamic Voltage and Frequency Scaling) o estrategias basadas en "power-gating". Sin embargo, éstas mejoras de consumo normalmente se obtienen a costa de sacrificar, en mayor o menor medida, las prestaciones del sistema, aumentado potencialmente así el riesgo de saturar la red, generando puntos de congestión que, a su vez, comprometen el rendimiento del resto del sistema. Un efecto colateral es el "Head-of-Line blocking", mediante el que paquetes congestionados en la cabeza de la cola impiden que otros paquetes no congestionados avancen. Con el fin de solucionar este problema, en ésta tesis, en primer lugar, proponemos técnicas novedosas de control de congestión para incrementar el rendimiento del sistema mediante la eliminación del "Head-of-Line blocking", mientras que, por otra parte, proponemos soluciones combinadas adaptadas a DVFS con el fin de conseguir mejoras en términos de rendimiento y energía. Además, proponemos una técnica de "power-gating" orientada a rutas de datos, la cual es capaz de detectar flujos de datos compartiendo recursos a lo largo de rutas y apagar dichos recursos de forma dinámica cuando no son necesarios. Con todas éstas soluciones combinadas podemos reducir el consumo de energía de la NoC en comparación con otras técnicas presentes en el estado del arte.Hui en dia, gr\`acies a les millores en l'escala d'integraci\'o, cada vegada s'integren m\'es i m\'es nuclis en un mateix xip, la qual cosa millora les seues prestacions. Per tal d'interconectar tots els nodes dins el xip es fa \'us d'una Xarxa en Xip (NoC; Network-on-Chip), la qual \'es l'encarregada d'intercanviar informaci\'o entre els nuclis. No obstant aix\`o, incrementar el nombre de nuclis en el xip tamb\'e comporta un important augment en el consum de la NoC, la qual cosa fa que aquesta es convertisca en una de les parts m\'es costoses del xip en termes de consum. Per aix\`o, en els \'ultims anys s'han proposat diverses t\`ecniques d'estalvi d'energia orientades a reduir el consum de la NoC mitjançant l'\'us de DVFS (Dynamic Voltage and Frequency Scaling) o estrat\`egies basades en ``power-gating''. Malgrat aix\`o, aquestes millores en les prestacions normalment s'obtenen a costa de sacrificar, en major o menor mesura, les prestacions del sistema i augmenta aix\'i el risc de saturar la xarxa al generar-se punts de congesti\'o, que al mateix temps, comprometen el rendiment de la resta del sistema. Un efecte col-lateral \'es el ``Head-of- Line blocking'', mitjançant el qual, els paquets congestionats al cap de la cua, impedixen que altres paquets no congestionats avancen. A fi de solucionar eixe problema, en aquesta tesi, en primer lloc, proposem noves t\`ecniques de control de congesti\'o amb l'objectiu d'incrementar el rendiment del sistema per mitj\`a de l'eliminaci\'o del ``Head-of- Line blocking'', i d'altra banda, proposem solucions combinades adaptades a DVFS amb la finalitat d'aconseguir millores en termes de rendiment i energia. A m\'es, proposem una t\`ecnica de ``power-gating'' orientada a rutes de dades, la qual \'es capa\c c de detectar fluxos de dades al compartir recursos al llarg de les rutes i apagar eixos recursos de forma din\`amica quan no s\'on necessaris. Amb totes aquestes solucions combinades podem reduir el consum d'energia de la NoC en comparaci\'o amb altres t\`ecniques presents en l'estat de l'art.Escamilla López, JV. (2017). Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90419TESISCompendi

    Spatial parallelism in the routers of asynchronous on-chip networks

    Get PDF
    State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Spatial parallelism in the routers of asynchronous on-chip networks

    Get PDF
    State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

    Get PDF
    • …
    corecore