42 research outputs found

    Out-of-Sequence Prevention for Multicast Input-Queuing Space-Memory-Memory Clos-Network

    Get PDF
    This paper proposes two cell dispatching algorithms for the input-queuing space-memory-memory (IQ-SMM) Closnetwork to reduce out-of-sequence (OOS) for multicast traffic. The frequent connection pattern change of DSRR results in a severe OOS problem. Based on the principle of DSRR, MFDSRR is able to reduce OOS but still suffers from it under high traffic load. MFRR maintains the connection pattern separately for each input and can eliminate the in-packet OOS and thus significantly reduces the reassembly buffer size and delay

    Traffic Management for Next Generation Transport Networks

    Get PDF

    Load balancing and scalable clos-network packet switches

    Get PDF
    In this dissertation three load-balancing Clos-network packet switches that attain 100% throughput and forward cells in sequence are introduced. The configuration schemes and the in-sequence forwarding mechanisms devised for these switches are also introduced. Also proposed is the use of matrix analysis as a tool for throughput analysis. In Chapter 2, a configuration scheme for a load-balancing Clos-network packet switch that has split central modules and buffers in between the split modules is introduced. This switch is called split-central-buffered Load-Balancing Clos-network (LBC) switch and it is cell based. The switch has four stages, namely input, central-input, central-output, and output stages. The proposed configuration scheme uses a pre-determined and periodic interconnection pattern in the input and split central modules to load-balance and route traffic. The LBC switch has low configuration complexity. The operation of the switch includes a mechanism applied at input and split-central modules to forward cells in sequence. The switch achieves 100% throughput under uniform and nonuniform admissible traffic with independent and identical distributions (i.i.d.). The high switching performance and low complexity of the switch are achieved while performing in-sequence forwarding and without resorting to memory speedup or central-stage expansion. This discussion includes both throughput analysis, where the operations that the configuration mechanism performs on the traffic traversing the switch are described, and a proof of in-sequence forwarding. Simulation analysis is presented as a practical demonstration of the switch performance on uniform and nonuniform i.i.d. traffic.In Chapter 3, a three-stage load balancing packet switch and its configuration scheme are introduced. The input- and central-stage switches are bufferless crossbars and the output-stage switches are buffered crossbars. This switch is called ThRee-stage Clos-network swItch and has queues at the middle stage and DEtermiNisTic scheduling (TRIDENT) and it is cell based. The proposed configuration scheme uses a pre-determined and periodic interconnection pattern in the input and central modules to load-balance and route traffic; therefore, it has low configuration complexity. The operation of the switch includes a mechanism applied at input and output modules to forward cells in sequence. In Chapter 4, a highly scalable load balancing three-stage Clos-network switch with Virtual Input-module output queues at ceNtral stagE (VINE) and crosspoint-buffers at output modules and its configuration scheme are introduced. VINE uses space switching in the first stage and buffered crossbars in the second and third stages. The proposed configuration scheme uses pre-determined and periodic interconnection patterns in the input modules for load balancing. The mechanism applied at the inputs, used to forward cells in sequence, is also introduced. VINE achieves 100% throughput under uniform and nonuniform admissible i.i.d. traffic. VINE achieves high switching performance, low configuration complexity, and in-sequence forwarding without resorting to memory speedup. In Chapter 5, matrix analysis is introduced as a tool for modeling, describing the internal operations, and analyzing the throughput of a packet switch

    Verkkoliikenteen hajauttaminen rinnakkaisprosessoitavaksi ohjelmoitavan piirin avulla

    Get PDF
    The expanding diversity and amount of traffic in the Internet requires increasingly higher performing devices for protecting our networks against malicious activities. The computational load of these devices may be divided over multiple processing nodes operating in parallel to reduce the computation load of a single node. However, this requires a dedicated controller that can distribute the traffic to and from the nodes at wire-speed. This thesis concentrates on the system topologies and on the implementation aspects of the controller. A field-programmable gate array (FPGA) device, based on a reconfigurable logic array, is used for implementation because of its integrated circuit like performance and high-grain programmability. Two hardware implementations were developed; a straightforward design for 1-gigabit Ethernet, and a modular, highly parameterizable design for 10-gigabit Ethernet. The designs were verified by simulations and synthesizable testbenches. The designs were synthesized on different FPGA devices while varying parameters to analyze the achieved performance. High-end FPGA devices, such as Altera Stratix family, met the target processing speed of 10-gigabit Ethernet. The measurements show that the controller's latency is comparable to a typical switch. The results confirm that reconfigurable hardware is the proper platform for low-level network processing where the performance is prioritized over other features. The designed architecture is versatile and adaptable to applications expecting similar characteristics.Internetin edelleen lisÀÀntyvÀ ja monipuolistuva liikenne vaatii entistÀ tehokkaampia laitteita suojaamaan tietoliikenneverkkoja tunkeutumisia vastaan. Tietoliikennelaitteiden kuormaa voidaan jakaa rinnakkaisille yksiköille, jolloin yksittÀisen laitteen kuorma pienenee. TÀmÀ kuitenkin vaatii erityisen kontrolloijan, joka kykenee hajauttamaan liikennettÀ yksiköille linjanopeudella. TÀmÀ tutkimus keskittyy em. kontrolloijan jÀrjestelmÀtopologioiden tutkimiseen sekÀ kontrolloijan toteuttamiseen ohjelmoitavalla piirillÀ, kuten kenttÀohjelmoitava jÀrjestelmÀpiiri (eng. field programmable gate-array, FPGA). Kontrolloijasta tehtiin yksinkertainen toteutus 1-gigabitin Ethernet-verkkoihin sekÀ modulaarinen ja parametrisoitu toteutus 10-gigabitin Ethernet-verkkoihin. Toteutukset verifioitiin simuloimalla sekÀ kÀyttÀmÀllÀ syntetisoituvia testirakenteita. Toteutukset syntetisoitiin eri FPGA-piireille vaihtelemalla samalla myös toteutuksen parametrejÀ. Tehokkaimmat FPGA-piirit, kuten Altera Stratix -piirit, saavuttivat 10-gigabitin prosessointivaatimukset. Mittaustulokset osoittavat, ettÀ kontrollerin vasteaika ei poikkea tavallisesta verkkokytkimestÀ. Työn tulokset vahvistavat kÀsitystÀ, ettÀ ohjelmoitavat piirit soveltuvat hyvin verkkoliikenteen matalantason prosessointiin, missÀ vaaditaan ensisijaisesti suorituskykyÀ. Suunniteltu arkkitehtuuri on monipuolinen ja soveltuu joustavuutensa ansiosta muihin samantyyppiseen sovelluksiin

    Non-minimal adaptive routing for efficient interconnection networks

    Get PDF
    RESUMEN: La red de interconexiĂłn es un concepto clave de los sistemas de computaciĂłn paralelos. El primer aspecto que define una red de interconexiĂłn es su topologĂ­a. Habitualmente, las redes escalables y eficientes en tĂ©rminos de coste y consumo energĂ©tico tienen bajo diĂĄmetro y se basan en topologĂ­as que encaran el lĂ­mite de Moore y en las que no hay diversidad de caminos mĂ­nimos. Una vez definida la topologĂ­a, quedando implĂ­citamente definidos los lĂ­mites de rendimiento de la red, es necesario diseñar un algoritmo de enrutamiento que se acerque lo mĂĄximo posible a esos lĂ­mites y debido a la ausencia de caminos mĂ­nimos, este ademĂĄs debe explotar los caminos no mĂ­nimos cuando el trĂĄfico es adverso. Estos algoritmos de enrutamiento habitualmente seleccionan entre rutas mĂ­nimas y no mĂ­nimas en base a las condiciones de la red. Las rutas no mĂ­nimas habitualmente se basan en el algoritmo de balanceo de carga propuesto por Valiant, esto implica que doblan la longitud de las rutas mĂ­nimas y por lo tanto, la latencia soportada por los paquetes se incrementa. En cuanto a la tecnologĂ­a, desde su introducciĂłn en entornos HPC a principios de los años 2000, Ethernet ha sido usado en un porcentaje representativo de los sistemas. Esta tesis introduce una implementaciĂłn realista y competitiva de una red escalable y sin pĂ©rdidas basada en dispositivos de red Ethernet commodity, considerando topologĂ­as de bajo diĂĄmetro y bajo consumo energĂ©tico y logrando un ahorro energĂ©tico de hasta un 54%. AdemĂĄs, propone un enrutamiento sobre la citada arquitectura, en adelante QCN-Switch, el cual selecciona entre rutas mĂ­nimas y no mĂ­nimas basado en notificaciones de congestiĂłn explĂ­citas. Una vez implementada la decisiĂłn de enrutar siguiendo rutas no mĂ­nimas, se introduce un enrutamiento adaptativo en fuente capaz de adaptar el nĂșmero de saltos en las rutas no mĂ­nimas. Este enrutamiento, en adelante ACOR, es agnĂłstico de la topologĂ­a y mejora la latencia en hasta un 28%. Finalmente, se introduce un enrutamiento dependiente de la topologĂ­a, en adelante LIAN, que optimiza el nĂșmero de saltos de las rutas no mĂ­nimas basado en las condiciones de la red. Los resultados de su evaluaciĂłn muestran que obtiene una latencia cuasi Ăłptima y mejora el rendimiento de algoritmos de enrutamiento actuales reduciendo la latencia en hasta un 30% y obteniendo un rendimiento estable y equitativo.ABSTRACT: Interconnection network is a key concept of any parallel computing system. The first aspect to define an interconnection network is its topology. Typically, power and cost-efficient scalable networks with low diameter rely on topologies that approach the Moore bound in which there is no minimal path diversity. Once the topology is defined, the performance bounds of the network are determined consequently, so a suitable routing algorithm should be designed to accomplish as much as possible of those limits and, due to the lack of minimal path diversity, it must exploit non-minimal paths when the traffic pattern is adversarial. These routing algorithms usually select between minimal and non-minimal paths based on the network conditions, where the non-minimal paths are built according to Valiant load-balancing algorithm. This implies that these paths double the length of minimal ones and then the latency supported by packets increases. Regarding the technology, from its introduction in HPC systems in the early 2000s, Ethernet has been used in a significant fraction of the systems. This dissertation introduces a realistic and competitive implementation of a scalable lossless Ethernet network for HPC environments considering low-diameter and low-power topologies. This allows for up to 54% power savings. Furthermore, it proposes a routing upon the cited architecture, hereon QCN-Switch, which selects between minimal and non-minimal paths per packet based on explicit congestion notifications instead of credits. Once the miss-routing decision is implemented, it introduces two mechanisms regarding the selection of the intermediate switch to develop a source adaptive routing algorithm capable of adapting the number of hops in the non-minimal paths. This routing, hereon ACOR, is topology-agnostic and improves average latency in all cases up to 28%. Finally, a topology-dependent routing, hereon LIAN, is introduced to optimize the number of hops in the non-minimal paths based on the network live conditions. Evaluations show that LIAN obtains almost-optimal latency and outperforms state-of-the-art adaptive routing algorithms, reducing latency by up to 30.0% and providing stable throughput and fairness.This work has been supported by the Spanish Ministry of Education, Culture and Sports under grant FPU14/02253, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2010-21291-C02-02, TIN2013-46957-C2-2-P, and TIN2013-46957-C2-2-P (AEI/FEDER, UE), the Spanish Research Agency under contract PID2019-105660RBC22/AEI/10.13039/501100011033, the European Union under agreements FP7-ICT-2011- 7-288777 (Mont-Blanc 1) and FP7-ICT-2013-10-610402 (Mont-Blanc 2), the University of Cantabria under project PAR.30.P072.64004, and by the European HiPEAC Network of Excellence through an internship grant supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. H2020-ICT-2015-687689

    Design and Architecture of a Hardware Platform to Support the Development of an Avionic Network Prototype

    Get PDF
    RĂ©sumĂ© en français La rĂ©cente Ă©volution des architectures des systĂšmes avioniques a permis la crĂ©ation de rĂ©seaux avioniques modulaire embarquĂ©s (IMA) et l’augmentation du nombre de systĂšmes embarquĂ©s numĂ©riques dans chaque avion. Cette transition vers une nouvelle gĂ©nĂ©ration d’avions plus Ă©lectriques permet une rĂ©duction du poids et de la consommation Ă©nergĂ©tique des aĂ©ronefs et aussi des couts de production et d’entretien. Pour atteindre une rĂ©duction du poids encore plus poussĂ©e et une amĂ©lioration de la bande passante des rĂ©seaux utilisĂ©s, des technologies innovatrices ont rĂ©cemment Ă©tĂ© adoptĂ©es : ARINC 825 et AFDX qui permettent en fait une rĂ©duction du cĂąblage nĂ©cessaire pour rĂ©aliser le rĂ©seau embarquĂ©.Dans le cadre du projet AVIO 402, qui inclus plusieurs sujets de recherche qui concernent aussi les capteurs et leur interface avec le systĂšme IMA, une nouvelle architecture a Ă©tĂ© proposĂ©e pour la rĂ©alisation du rĂ©seau utilisĂ© pour le systĂšme de contrĂŽle de vol. Cette architecture est basĂ©e sur des bus ARINC 825 locaux, connectĂ©s entre eux en utilisant un rĂ©seau AFDX qui offre une meilleure bande passante ; les ponts entre les deux protocoles et les modules qui connectent les nƓuds au rĂ©seau ont une structure gĂ©nĂ©rique pour supporter des protocoles diffĂ©rents et aussi plusieurs types des capteurs et actionneurs. Pour une Ă©valuation des performances et une analyse des dĂ©fis de son implĂ©mentation, la rĂ©alisation d’un prototype du rĂ©seau proposĂ© est requise par le projet. Dans ce mĂ©moire, le dĂ©veloppement d’une plateforme matĂ©rielle pour soutenir la rĂ©alisation de ce prototype est traitĂ© et trois modules fondamentaux du prototype ont Ă©tĂ© conçus sous forme de "IP core" pour ĂȘtre subsĂ©quemment intĂ©grĂ©s dans l’architecture du rĂ©seau qui sera implĂ©mentĂ© en utilisant des FPGA. Les trois systĂšmes sont le contrĂŽleur du bus CAN, utilisĂ© comme base pour l’implĂ©mentation du protocole ARINC 825, le "End System" AFDX et le commutateur nĂ©cessaires pour la rĂ©alisation d’un rĂ©seau AFDX. Dans la premiĂšre partie de ce mĂ©moire, les objectifs visĂ©s sont prĂ©sentĂ©s et une analyse des spĂ©cifications des protocoles considĂ©rĂ©s est fournie, cela permet d’identifier les fonctionnalitĂ©s qui doivent ĂȘtre incluses dans chaque systĂšme et de dĂ©terminer si des solutions pour leur implĂ©mentation ont dĂ©jĂ  Ă©tĂ© publiĂ©es et peuvent ĂȘtre rĂ©utilisĂ©es. Ensuite, le dĂ©veloppement de chaque systĂšme est prĂ©sentĂ© et les choix de conception sont expliquĂ©s afin de montrer comment les fonctionnalitĂ©s requises par les spĂ©cifications des deux protocoles peuvent ĂȘtre implĂ©mentĂ©es pour mieux rĂ©pondre aux nĂ©cessitĂ©s du projet AVIO 402.----------Abstract The objective of the present project is to design three modules for a hardware platform that will support the implementation of an avionic network prototype based on the FPGA technology. The considered network has been conceived to reduce cabling weight and to improve the available bandwidth, and it exploits the recently introduced ARINC 825 and AFDX protocols. In order to support the implementation of both these protocols, a CAN bus controller, an AFDX End System, and an AFDX Switch have been designed. After an extensive review of the existing literature about the two related avionic protocols, a study of the existing solutions for CAN and Ethernet protocols, on which they are based, has been done as well to identify what knowledge and technology could be reused. Because they are very similar, a flexible CAN controller has been implemented in hardware instead of an ARINC 825 one in order to support both these technologies and in order to reduce the IP core size. A combined HW/SW approach has been preferred for the AFDX End System architecture to leverage an existing UDP/IP protocol stack and the Ethernet layer included in the Linux kernel has been modified to create a portable and configurable implementation of AFDX. Since various problems have been encountered to reproduce an ARINC 653 compliant environment on the embedded system, the suggested design has been ported in a PC. Finally, an original solution for the implementation of the AFDX switch fabric has been finally presented; a space-division switching architecture has been chosen and tailored to meet the AFDX specification. Hardware parallelism is exploited to reduce the latency introduced on each frame by filtering them concurrently. Input buffers have been duplicated to separate high from low priority traffics, further reducing latency of critical frames and creating a redundancy that reduce the possibility of packet loss. Packet scheduling and double queuing guarantee that all critical frames are forwarded before low priority ones.Keywords: Avionic Full-Duplex Switched Ethernet, AFDX, ARINC 664, ARINC 825, CAN, Avionic Data Networks, Ethernet Switch, FPGA

    Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks

    Get PDF
    Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units. Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems. No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology. Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties. Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved
    corecore