33 research outputs found

    Fabric-on-a-Chip: Toward Consolidating Packet Switching Functions on Silicon

    Get PDF
    The switching capacity of an Internet router is often dictated by the memory bandwidth required to bu¤er arriving packets. With the demand for greater capacity and improved service provisioning, inherent memory bandwidth limitations are encountered rendering input queued (IQ) switches and combined input and output queued (CIOQ) architectures more practical. Output-queued (OQ) switches, on the other hand, offer several highly desirable performance characteristics, including minimal average packet delay, controllable Quality of Service (QoS) provisioning and work-conservation under any admissible traffic conditions. However, the memory bandwidth requirements of such systems is O(NR), where N denotes the number of ports and R the data rate of each port. Clearly, for high port densities and data rates, this constraint dramatically limits the scalability of the switch. In an effort to retain the desirable attributes of output-queued switches, while significantly reducing the memory bandwidth requirements, distributed shared memory architectures, such as the parallel shared memory (PSM) switch/router, have recently received much attention. The principle advantage of the PSM architecture is derived from the use of slow-running memory units operating in parallel to distribute the memory bandwidth requirement. At the core of the PSM architecture is a memory management algorithm that determines, for each arriving packet, the memory unit in which it will be placed. However, to date, the computational complexity of this algorithm is O(N), thereby limiting the scalability of PSM switches. In an effort to overcome the scalability limitations, it is the goal of this dissertation to extend existing shared-memory architecture results while introducing the notion of Fabric on a Chip (FoC). In taking advantage of recent advancements in integrated circuit technologies, FoC aims to facilitate the consolidation of as many packet switching functions as possible on a single chip. Accordingly, this dissertation introduces a novel pipelined memory management algorithm, which plays a key role in the context of on-chip output- queued switch emulation. We discuss in detail the fundamental properties of the proposed scheme, along with hardware-based implementation results that illustrate its scalability and performance attributes. To complement the main effort and further support the notion of FoC, we provide performance analysis of output queued cell switches with heterogeneous traffic. The result is a flexible tool for obtaining bounds on the memory requirements in output queued switches under a wide range of tra¢ c scenarios. Additionally, we present a reconfigurable high-speed hardware architecture for real-time generation of packets for the various traffic scenarios. The work presented in this thesis aims at providing pragmatic foundations for designing next-generation, high-performance Internet switches and routers

    Architecture design and performance analysis of practical buffered-crossbar packet switches

    Get PDF
    Combined input crosspoint buffered (CICB) packet switches were introduced to relax inputoutput arbitration timing and provide high throughput under admissible traffic. However, the amount of memory required in the crossbar of an N x N switch is N2x k x L, where k is the crosspoint buffer size and needs to be of size RTT in cells, L is the packet size. RTT is the round-trip time which is defined by the distance between line cards and switch fabric. When the switch size is large or RTT is not negligible, the memory amount required makes the implementation costly or infeasible for buffered crossbar switches. To reduce the required memory amount, a family of shared memory combined-input crosspoint-buffered (SMCB) packet switches, where the crosspoint buffers are shared among inputs, are introduced in this thesis. One of the proposed switches uses a memory speedup of in and dynamic memory allocation, and the other switch avoids speedup by arbitrating the access of inputs to the crosspoint buffers. These two switches reduce the required memory of the buffered crossbar by 50% or more and achieve equivalent throughput under independent and identical traffic with uniform distributions when using random selections. The proposed mSMCB switch is extended to support differentiated services and long RTT. To support P traffic classes with different priorities, CICB switches have been reported to use N2x k x L x P amount of memory to avoid blocking of high priority cells.The proposed SMCB switch with support for differentiated services requires 1/mP of the memory amount in the buffered crossbar and achieves similar throughput performance to that of a CICB switch with similar priority management, while using no speedup in the shared memory. The throughput performance of SMCB switch with crosspoint buffers shared by inputs (I-SMCB) is studied under multicast traffic. An output-based shared-memory crosspoint buffered (O-SMCB) packet switch is proposed where the crosspoint buffers are shared by two outputs and use no speedup. The proposed O-SMCB switch provides high performance under admissible uniform and nonuniform multicast traffic models while using 50% of the memory used in CICB switches. Furthermore, the O-SMCB switch provides higher throughput than the I-SMCB switch. As SMCB switches can efficiently support an RTT twice as long as that supported by CICB switches and as the performance of SMCB switches is bounded by a matching between inputs and crosspoint buffers, a new family of CICB switches with flexible access to crosspoint buffers are proposed to support longer RTTs than SMCB switches and to provide higher throughput under a wide variety of admissible traffic models. The CICB switches with flexible access allow an input to use any available crosspoint buffer at a given output. The proposed switches reduce the required crosspoint buffer size by a factor of N , keep the service of cells in sequence, and use no speedup. This new class of switches achieve higher throughput performance than CICB switches under a large variety of traffic models, while supporting long RTTs. Crosspoint buffered switches that are implemented in single chips have limited scalability. To support a large number of ports in crosspoint buffered switches, memory-memory-memory (MMM) Clos-network switches are an alternative. The MMM switches that use minimum memory amount at the central module is studied. Although, this switch can provide a moderate throughput, MMM switch may serve cells out of sequence. As keeping cells in sequence in an MMM switch may require buffers be distributed per flow, an MMM with extended memory in the switch modules is studied. To solve the out of sequence problem in MMM switches, a queuing architecture is proposed for an MMM switch. The service of cells in sequence is analyzed

    On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report

    Get PDF
    For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices

    Information Switching Processor (ISP) contention analysis and control

    Get PDF
    Future satellite communications, as a viable means of communications and an alternative to terrestrial networks, demand flexibility and low end-user cost. On-board switching/processing satellites potentially provide these features, allowing flexible interconnection among multiple spot beams, direct to the user communications services using very small aperture terminals (VSAT's), independent uplink and downlink access/transmission system designs optimized to user's traffic requirements, efficient TDM downlink transmission, and better link performance. A flexible switching system on the satellite in conjunction with low-cost user terminals will likely benefit future satellite network users

    On packet switch design

    Get PDF

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Joint buffer management and scheduling for input queued switches

    Get PDF
    Input queued (IQ) switches are highly scalable and they have been the focus of many studies from academia and industry. Many scheduling algorithms have been proposed for IQ switches. However, they do not consider the buffer space requirement inside an IQ switch that may render the scheduling algorithms inefficient in practical applications. In this dissertation, the Queue Length Proportional (QLP) algorithm is proposed for IQ switches. QLP considers both the buffer management and the scheduling mechanism to obtain the optimal allocation region for both bandwidth and buffer space according to real traffic load. In addition, this dissertation introduces the Queue Proportional Fairness (QPF) criterion, which employs the cell loss ratio as the fairness metric. The research in this dissertation will show that the utilization of network resources will be improved significantly with QPF. Furthermore, to support diverse Quality of Service (QoS) requirements of heterogeneous and bursty traffic, the Weighted Minmax algorithm (WMinmax) is proposed to efficiently and dynamically allocate network resources. Lastly, to support traffic with multiple priorities and also to handle the decouple problem in practice, this dissertation introduces the multiple dimension scheduling algorithm which aims to find the optimal scheduling region in the multiple Euclidean space

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Packet switch architecture for efficient unicast and multicast traffic switching

    Get PDF
    У дисертацији је предложена једноставна архитектура свича као и алгоритми за ефикасно распоређивање и комутацију уникаст и мултикаст саобраћаја, што је од великог значаја за савремене телекомуникационе мреже у којима количина саобраћаја константно расте. Први дио доприноса ове дисертације чини приједлог рјешења свича за ефикасно управљање уникаст саобраћајем. Ово рјешење је развијено комбинујући најбоље особине постојећих рјешења, при том избјегавајући одређене њихове недостатке. Циљ је да се омогући што брже прослијеђивање пакета уз прихватљив ниво хардверске комплексности. Свич који је развијен у овој дисертацији представља комбинацију свичева са баферима на улазу и свичева који користе Биркхоф-фон Нојман принцип детерминистичког конфигурисања комутационог модула па се не захтијева прорачун конфигурација комутатора. При томе, за разлику од већине рјешења која користе Биркхоф-фон Нојман принцип конфигурисања, у предложеном рјешењу могуће је користити само један физички комутациони модул који би обављао функције оба логичка комутациона модула. Да би се гарантовало да није дошло до поремећаја редослиједа пакета, предложен је и једноставан алгоритам за одабир пакета за слање. Такође, дат је и приједлог унапријеђења подршке за фер сервис првобитно предложеног рјешења за комутацију уникаст саобраћаја. У другом дијелу дисертације, пажња је посвећена унапријеђењу предложеног рјешења за ефикасно управљање и мултикаст саобраћајем. Потреба за овим се јавила као посљедица развоја нових сервиса (нпр. IPTV, онлајн игре итд.) који генеришу такав тип саобраћаја. Како је удио мултикаст саобраћаја у мрежи постао незанемарљив, перформансе свичева који су развијени примарно за уникаст саобраћај значајно опадају. Рјешење које је предложено у првом дијелу дисертације је унапријеђено додавањем модула који служи за управљање мултикаст саобраћајем. Овдје је идеја да се оптерећење са улазног порта који прима мултикаст пакете распореди на више портова који треба да приме те пакете. Овако је на релативно једноставан начин омогућено ефикасно управљање мултикаст саобраћајем. У оквиру дисертације су урађене софтверске симулације које су показале да ова рјешења постижу врло добре перформансе у односу на постојећа. Такође, урађена је и хардверска имплементација предложеног основног уникаст рјешења која је показала релативно скромне захтјеве у погледу хардверских ресурса.The dissertation proposes a simple switch architecture as well as algorithms for efficient scheduling and switching of unicast and multicast traffic, which is of great importance for modern telecommunication networks because their traffic load is constantly and rapidly increasing. The first part of the dissertation’s contributions comprises a proposed switch which efficiently manages unicast traffic. The proposed switch is developed by using the best characteristics of the existing solutions while avoiding some of their drawbacks. The aim is to enable fast packet forwarding while achieving an acceptable level of hardware complexity. The proposed solution combines architecture with buffers at input ports and Birkhoff-von Neumann architecture based on deterministic switch module configurations. Hence, calculation of switch module configurations is not needed. Also, folded architecture is possible, which means that only one physical switching module is used for both switching stages of Birkhoff-von Neumann architecture. A simple algorithm for packet scheduling has been developed in order to avoid packet out-of-sequence problems. Finally, fair service support improvement is introduced for the originally proposed switch solution. The second part of the dissertation is devoted to the enhancement of the proposed unicast switch for efficient management of multicast traffic. The need for multicast support has emerged as a consequence of the development and introduction of new services (such as IPTV, online gaming, etc.) that generate multicast traffic. As the amount of multicast traffic is not negligible anymore, the performance of packet switches that were primarily developed for the unicast traffic is significantly degraded. The solution proposed in the first part of the diseration is enhanced with the module used for multicast traffic management. Here, the idea is that the multicast load at some input port is distributed over ports that are also destination for the multicast packets. This approach enables relatively simple but efficient management of multicast traffic. In this dissertation, software simulations were conducted, which confirmed that proposed solutions achieve very good performances compared to existing solutons. Furthermore, hardware implementation of the proposed basic unicast switch solution shows modest requirements in terms of needed hardware resources

    Performance Analysis in IP-Based Industrial Communication Networks

    Get PDF
    S rostoucím počtem řídicích systémů a jejich distribuovanosti získávájí komunikační sítě na důležitosti a objevují se nové výzkumné trendy. Hlavní problematikou v této oblasti, narozdíl od dřívějších řídicích systémů využívajících dedikovaných komunikačních obvodů, je časově proměnné zpoždění měřicích a řídicích signálů způsobené paketově orientovanými komunikačními prostředky, jako např. Ethernet. Aspekty komunikace v reálném čase byly v těchto sítích již úspěšně vyřešeny. Nicméně, analýzy trendů trhu předpovídají budoucí využití také IP sítí v průmyslové komunikaci pro časově kritickou procesní vyměnu dat. IP komunikace má ovšem pouze omezenou podporu v instrumentaci pro průmyslovou automatizace. Tato výzva byla nedávno technicky vyřešena v rámci projektu Virtual Automation Networks (virtuální automatizační sítě - VAN) zapojením mechanismů kvality služeb (QoS), které jsou schopny zajistit měkkou úroveň komunikace v reálném čase. Předložená dizertační práce se zaměřuje na aspekty výkonnosti reálného času z analytického hlediska a nabízí prostředek pro hodnocení využitelnosti IP komunikace pro budoucí průmyslové aplikace. Hlavním cílem této dizertační práce je vytvoření vhodného modelovacího rámce založeného na network calculus, který pomůže provést worst-case výkonnostní analýzu časového chování IP komunikačních sítí a jejich prvků určených pro budoucí použití v průmyslové automatizaci. V práci byla použita empirická analýza pro určení dominantních faktorů ovlivňujících časového chování síťových zařízení a identifikaci parametrů modelů těchto zařízení. Empirická analýza využívá nástroj TestQoS vyvinutý pro tyto účely. Byla navržena drobná rozšíření rámce network calculus, která byla nutná pro modelování časového chování používaných zařízení. Bylo vytvořeno několik typových modelů zařízení jako výsledek klasifikace různých architektur síťových zařízení a empiricky zjištěných dominantních faktorů. U modelovaných zařízení byla využita nová metoda identifikace parametrů. Práce je zakončena validací časových modelů dvou síťových zařízení (přepínače a směrovače) oproti empirickým pozorováním.With the growing scale of control systems and their distributed nature, communication networks have been gaining importance and new research challenges have been appearing. The major problem, contrary to previously used control systems with dedicated communication circuits, is time-varying delay of control and measurement signals introduced by packet-switched networks, such as Ethernet. The real-time issues in these networks have been tackled by proper adaptations. Nevertheless, market trend analyses foresee also future adoptions of IP-based communication networks in industrial automation for time-critical run-time data exchange. IP-based communication has only a limited support from the existing instrumentation in industrial automation. This challenge has recently been technically tackled within the Virtual Automation Networks (VAN) project by adopting the quality of service (QoS) architecture delivering soft-real-time communication behaviour. This dissertation focuses on the real-time performance aspects from the analytical point of view and provides means for applicability assessment of IP-based communication for future industrial applications. The main objective of this dissertation is establishment of a relevant modelling framework based on network calculus which will assist worst-case performance analysis of temporal behaviour of IP-based communication networks and networking devices intended for future use in industrial automation. Empirical analysis was used to identify dominant factors influencing the temporal performance of networking devices and for model parameter identification. The empirical analysis makes use of the TestQoS tool developed for this purpose. Minor extensions to the network calculus framework were proposed enabling to model the required temporal behaviour of networking devices. Several exemplary models were inferred as a result of classification of different networking device architectures and empirically identified dominant factors. A novel method for parameter identification was used with the modelled devices. Finally, two temporal models of networking devices (a switch and a router) were validated against empirical observations.
    corecore