196 research outputs found

    A Scalable Multi-Stage Packet-Switch for Data Center Networks

    Get PDF
    The growing trends of data centers over last decades including social networking, cloud-based applications and storage technologies enabled many advances to take place in the networking area. Recent changes imply continuous demand for bandwidth to manage the large amount of packetized traffic. Cluster switches and routers make the switching fabric in a Data Center Network (DCN) environment and provide interconnectivity between elements of the same DC and inter DCs. To handle the constantly variable loads, switches need deliver outstanding throughput along with resiliency and scalability for DCN requirements. Conventional DCN switches adopt crossbars or/and blocks of memories mounted in a multistage fashion (commonly 2-Tiers or 3-Tiers). However, current multistage switches, with their space-memory variants, are either too complex to implement, have poor performance, or not cost effective. We propose a novel and highly scalable multistage switch based on Networkson- Chip (NoC) fabrics for DCNs. In particular, we describe a three-stage Clos packet-switch with a Round Robin packets dispatching scheme where each central stage module is based on a Unidirectional NoC (UDN), instead of the conventional singlehop crossbar. The design, referred to as Clos-UDN, overcomes shortcomings of traditional multistage architectures as it (i) Obviates the need for a complex and costly input modules, by means of few, yet simple, input FIFO queues. (ii) Avoids the need for a complex and synchronized scheduling process over a high number of input-output modules and/or port pairs. (iii) Provides speedup, load balancing and path-diversity thanks to a dynamic dispatching scheme as well as the NoC based fabric nature. Simulations show that the Clos-UDN outperforms some common multistage switches under a range of input traffics, making it highly appealing for ultra-high capacity DC networks

    A Scalable Packet-Switch Based on Output-Queued NoCs for Data Centre Networks

    Get PDF
    The switch fabric in a Data-Center Network (DCN) handles constantly variable loads. This is stressing the need for high-performance packet switches able to keep pace with climbing throughput while maintaining resiliency and scalability. Conventional multistage switches with their space-memory variants proved to be performance limited as they do not scale well with the proliferating DC requirements. Most proposals are either too complex to implement or not cost effective. In this paper, we present a highly scalable multistage switching architecture for DC switching fabrics. We describe a three-stage Clos packet-switch fabric with Output-Queued Unidirectional NoC (OQ-UDN) modules and Round-Robin packets dispatching scheme. The proposed OQ Clos-UDN architecture avoids the need for complex and costly input modules and simplifies the scheduling process. Thanks to a dynamic packets dispatching and the multi-hop nature of the UDN modules, the switch provides load balancing and path-diversity. We compared our proposed architecture to state-of-the art previous architectures under extensive uniform and non-uniform DC traffic settings. Simulations of various switch settings have shown that the proposed OQ Clos-UDN outperforms previous proposals and maintains high throughput and latency performance

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    High-Capacity Clos-Network Switch for Data Center Networks

    Get PDF
    Scaling-up Data Center Networks (DCNs) should be done at the network level as well as the switching elements level. The glaring reason for this, is that switches/routers deployed in the DCN can bound the network capacity and affect its performance if improperly chosen. Many multistage switching architectures have been proposed to fit for the next-generation networking needs. However all of them are either performance limited or too complex to be implemented. Targeting scalability and performance, we propose the design of a large-capacity switch in which we affiliate a multistage design with a Networks-on- Chip (NoC) design. The proposal falls into the category of buffered multistage switches. Still, it has a different architectural aspect and scheduling process. Dissimilar to common point-to-point crossbars, NoCs used at the heart of the three-stage Clos-network allow multiple packets simultaneously in the modules where they can be adaptively transported using a pipelined scheduling scheme. Our simulations show that the switch scales well with the load and size variation. It outperforms a variety of architectures under a range of traffic arrivals

    Floorplan-Aware High Performance NoC Design

    Full text link
    Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci

    Ethernet Networks for Real-Time Use in the ATLAS Experiment

    Get PDF
    Ethernet became today's de-facto standard technology for local area networks. Defined by the IEEE 802.3 and 802.1 working groups, the Ethernet standards cover technologies deployed at the first two layers of the OSI protocol stack. The architecture of modern Ethernet networks is based on switches. The switches are devices usually built using a store-and-forward concept. At the highest level, they can be seen as a collection of queues and mathematically modelled by means of queuing theory. However, the traffic profiles on modern Ethernet networks are rather different from those assumed in classical queuing theory. The standard recommendations for evaluating the performance of network devices define the values that should be measured but do not specify a way of reconciling these values with the internal architecture of the switches. The introduction of the 10 Gigabit Ethernet standard provided a direct gateway from the LAN to the WAN by the means of the WAN PHY. Certain aspects related to the actual use of WAN PHY technology were vaguely defined by the standard. The ATLAS experiment at CERN is scheduled to start operation at CERN in 2007. The communication infrastructure of the Trigger and Data Acquisition System will be built using Ethernet networks. The real-time operational needs impose a requirement for predictable performance on the network part. In view of the diversity of the architectures of Ethernet devices, testing and modelling is required in order to make sure the full system will operate predictably. This thesis focuses on the testing part of the problem and addresses issues in determining the performance for both LAN and WAN connections. The problem of reconciling results from measurements to architectural details of the switches will also be tackled. We developed a scalable traffic generator system based on commercial-off-the-shelf Gigabit Ethernet network interface cards. The generator was able to transmit traffic at the nominal Gigabit Ethernet line rate for all frame sizes specified in the Ethernet standard. The calculation of latency was performed with accuracy in the range of +/- 200 ns. We indicate how certain features of switch architectures may be identified through accurate throughput and latency values measured for specific traffic distributions. At this stage, we present a detailed analysis of Ethernet broadcast support in modern switches. We use a similar hands-on approach to address the problem of extending Ethernet networks over long distances. Based on the 1 Gbit/s traffic generator used in the LAN, we develop a methodology to characterise point-to-point connections over long distance networks. At higher speeds, a combination of commercial traffic generators and high-end servers is employed to determine the performance of the connection. We demonstrate that the new 10 Gigabit Ethernet technology can interoperate with the installed base of SONET/SDH equipment through a series of experiments on point-to-point circuits deployed over long-distance network infrastructure in a multi-operator domain. In this process, we provide a holistic view of the end-to-end performance of 10 Gigabit Ethernet WAN PHY connections through a sequence of measurements starting at the physical transmission layer and continuing up to the transport layer of the OSI protocol stack

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Scheduling in switches with small internal buffers

    Full text link
    • …
    corecore