354 research outputs found

    Steroid OpenFlow Service Scalability Analysis

    Get PDF
    Modern cloud applications are hosted on data centers across vast geographical scopes and exchange large amounts of data continuously. Transmission Control Protocol (TCP) is the most popular protocol for reliable data transfer; however, due to TCP’s congestion control mechanism, maximum achievable throughput across a large bandwidth-delay product (BDP) network is limited. Various solutions exist to enhance data transfer throughput but they usually require non-trivial and explicit installation and tuning of specialized software on both sides which makes deployment limited. A software defined networking (SDN) based solution Steroid OpenFlow Service (SOS) was developed that utilizes multiple parallel TCP connections to transparently enhance network performance across a large BDP network. OpenFlow is used to transparently redirect user traffic to nearby service machines called SOS agent and these agents use multiple TCP connections to transfer data fast across large BDP network. While SOS has shown significant improvements in data transfer throughput, there are multiple factors which affect its performance. This study focuses on SOS scalability analysis targeting four critical factors: CPU utilization of SOS agents, sockets used for parallel TCP connections, how OpenFlow is used and network configurations. Through this study, the SOS agent code was revamped for performance improvements. Experiments were conducted on the National Science Foundation’s CloudLab platform to assess the effect of the above-mentioned factors on SOS performance. Results have shown improvement in throughput per SOS session from 10.96Gbps to 12.82Gbps by removing CPU bottleneck on 25Gbps network. SOS deployment over an InfiniBand network has shown a linear increase in throughput to 23.22Gbps with optimal network configurations. Using OpenFlow to support multiple client connections to the same server have increased throughput from 12.17Gbps to 17.20Gbps. The study showed that with code-level improvements and optimal network configurations, SOS performance can be improved substantially

    InfiniBand-Based Mechanism to Enhance Multipath QoS in MANETs

    Get PDF
    Mobile Ad-hoc Networks (MANETs), the continuous changes in topology and the big amounts of data exchanged across the network makes it difficult for a single routing algorithm to route data efficiently between nodes. MANETs usually suffer from high packet loss rates and high link failure rates, which also makes it difficult to exchange data in effective and reliable fashion. These challenges usually increase congestion on some links while other links are almost free. In this thesis, we propose a novel mechanism to enhance QoS in multipath routing protocols in MANETs based on the InfiniBand (IB) QoS architecture. The basic idea of our approach is to enhance the path balancing to reduce congestion on overloaded links. This mechanism has enabled us to give critical applications higher priority to send them packet when routing their packets across the network, effectively manage frequent connections and disconnections and thus help reduce link failures and packet loss rates, and reduce the overall power consumption as a consequence of the previous gains. We have tested the scheme on the (IBMGTSim) simulator and achieved significant improvements in QoS parameters compared to two well-known routing protocols: AODV and AOMDV.هناك نوع من الشبكات حيث يكون كل المكونات فيها عبارة عن اجهزة متحركة بدون اي بنية تحتية تسمى "MANET "في هذا النوع من الشبكات تتعاون االجهزة ذاتيا لتحديد الطرق في ما بينها والنها متحركة تقوم هذه االجهزة بحساب اكثر من طريق عو ًضا عن حساب طريق واحد لتقليل من احتمالية فشل في االرسال حيث اذا تم فشل في طريق معينة تبقى الطرق االخرة سليمة. وفي ناحية اخرى ولتنوع اهمية البرامج والخدمات التي توفرها هذه االجهزة هناك ما يسمى "بجودى الخدمات Service of Quality" حيث يقوم المستخدم بوضع اولويات للبرامج والخدمات من استهالك المصادر المتاحة, والطريق الشائعة هي ان يقوم المستخدم بوضع حدود على سرعة استعمال الشبكة من قبل البرامج االقل اهمية لترك المصادر متاحة للبرامج الاكثر المهمة بشكل اكثر وهذا الحل يحتوي على الكثير من المشاكل في هذا النوع من الشبكات, حيث ان مواصفات الطرق غير معروفة وغير ثابتة وقد تحتوي او تتغير الى قيم اقل من الحدود الموضوعة للبرمج الغير مهمة فتتساوى البرامج والخدمات االقل اهمية بالبرامج االكثر اهمية مما يعني فشل في جودة الخدمات. من خالل بحثنا عن حلول ودراسة انواع مختلفة من الشبكات وجدنا نوع من تطبيق جودة الخدمات في نوع الشبكات المسمى بInfiniBand حيث يتم تطبيق جودة الخدمات من خالل تغيير عدد الرسال المبعثة من قبل البرامج, حيث تقوم االجهزة بارسال عدد اكبر من الرسال التابعة للبرامج المهمة مقارنة بعدد الرسال التابعة للبرامج االقل اهمية, ويتم ذلك باستخدام الصفوف, حيث تصطف الرسال من البرامج المهمة بصف يختلف عن الصف الذي يحتوي على رسال البرامج الغير مهمة. هذا الحل له فائدتان مهمتان االولى انه ال يوثر عالطريقة التقليدية ويمكن ان يستخدم معها والفائدة الثانية انه وبخالف الطريقة التقليدية, الطريقة الجديدة ال تتاثر بصفات الطريق المحسوبة او بتغير صفاتها فنسبة عدد الرسال تكون نفسها مهما اختلفت الطرق و صفاتها, بعد تطبيق هذا النوع وجددنا تحسين في كفائة االرسال تصل الى 18 %في جودة التوصيل و 10 %في سرعة الوصول مع العلم ان جودة الخدمات لم تفشل على غرار الطريقة التقليدية

    FlexVC: Flexible virtual channel management in low-diameter networks

    Get PDF
    Deadlock avoidance mechanisms for lossless lowdistance networks typically increase the order of virtual channel (VC) index with each hop. This restricts the number of buffer resources depending on the routing mechanism and limits performance due to an inefficient use. Dynamic buffer organizations increase implementation complexity and only provide small gains in this context because a significant amount of buffering needs to be allocated statically to avoid congestion. We introduce FlexVC, a simple buffer management mechanism which permits a more flexible use of VCs. It combines statically partitioned buffers, opportunistic routing and a relaxed distancebased deadlock avoidance policy. FlexVC mitigates Head-of-Line blocking and reduces up to 50% the memory requirements. Simulation results in a Dragonfly network show congestion reduction and up to 37.8% throughput improvement, outperforming more complex dynamic approaches. FlexVC merges different flows of traffic in the same buffers, which in some cases makes more difficult to identify the traffic pattern in order to support nonminimal adaptive routing. An alternative denoted FlexVCminCred improves congestion sensing for adaptive routing by tracking separately packets routed minimally and nonminimally, rising throughput up to 20.4% with 25% savings in buffer area.This work has been supported by the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), the Spanish Ministry of Economy, Industry and Competitiveness (contracts TIN2015-65316), the Spanish Research Agency (AEI/FEDER, UE - TIN2016-76635-C2-2-R), the Spanish Ministry of Education (FPU grant FPU13/00337), the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014- SGR-1272), the European Union FP7 programme (RoMoL ERC Advanced Grant GA 321253), the European HiPEAC Network of Excellence and the European Union’s Horizon 2020 research and innovation programme (Mont-Blanc project under grant agreement No 671697).Peer ReviewedPostprint (author's final draft

    Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities (Complete Version)

    Full text link
    Homa is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network utilization. Homa uses in-network priority queues to ensure low latency for short messages; priority allocation is managed dynamically by each receiver and integrated with a receiver-driven flow control mechanism. Homa also uses controlled overcommitment of receiver downlinks to ensure efficient bandwidth utilization at high load. Our implementation of Homa delivers 99th percentile round-trip times less than 15{\mu}s for short messages on a 10 Gbps network running at 80% load. These latencies are almost 100x lower than the best published measurements of an implementation. In simulations, Homa's latency is roughly equal to pFabric and significantly better than pHost, PIAS, and NDP for almost all message sizes and workloads. Homa can also sustain higher network loads than pFabric, pHost, or PIAS.Comment: This paper is an extended version of the paper on Homa that was published in ACM SIGCOMM 2018. Material had to be removed from Sections 5.1 and 5.2 to meet the SIGCOMM page restrictions; this version restores the missing material. This paper is 18 pages, plus two pages of reference

    Predictive and distributed routing balancing (PR-DRB) : high speed interconnection networks

    Get PDF
    Current parallel applications running on clusters require the use of an interconnection network to perform communications among all computing nodes available. Imbalance of communications can produce network congestion, reducing throughput and increasing latency, degrading the overall system performance. On the other hand, parallel applications running on these networks posses representative stages which allow their characterization, as well as repetitive behavior that can be identified on the basis of this characterization. This work presents the Predictive and Distributed Routing Balancing (PR-DRB), a new method developed to gradually control network congestion, based on paths expansion, traffic distribution and effective traffic load, in order to maintain low latency values. PR-DRB monitors messages latencies on intermediate routers, makes decisions about alternative paths and record communication pattern information encountered during congestion situation. Based on the concept of applications repetitiveness, best solution recorded are reapplied when saved communication pattern re-appears. Traffic congestion experiments were conducted in order to evaluate the performance of the method, and improvements were observed.Les aplicacions paral·leles actuals en els Clústers requereixen l'ús d'una xarxa d'interconnexió per comunicar a tots els nodes de còmput disponibles. El desequilibri en la càrrega de comunicacions pot congestionar la xarxa, incrementant la latència i disminuint el throughput, degradant el rendiment total del sistema. D'altra banda, les aplicacions paral·leles que s'executen sobre aquestes xarxes contenen etapes representatives durant la seva execució les quals permeten caracteritzar-les, a més d'extraure un comportament repetitiu que pot ser identificat en base a aquesta caracterització. Aquest treball presenta el Balanceig Predictiu de Encaminament Distribuït (PR-DRB), un nou mètode desenvolupat per controlar la congestió a la xarxa en forma gradual, basat en l'expansió de camins, la distribució de trànsit i càrrega efectiva actual per tal de mantenir una latència baixa. PR-DRB monitoritza la latència dels missatges en els encaminadors, pren decisions sobre els camins alternatius a utilitzar i registra la informació de la congestió sobre la base del patró de comunicacions detectat, utilitzant com a concepte base la repetitivitat de les aplicacions per després tornar a aplicar la millor solució quan aquest patró es repeteixi. Experiments de trànsit amb congestió van ser portats a terme per avaluar el rendiment del mètode, els quals van mostrar la bondat del mateix.Las aplicaciones paralelas actuales en los Clústeres requieren el uso de una red de interconexión para comunicar a todos los nodos de cómputo disponibles. El desbalance en la carga de comunicaciones puede congestionar la red, incrementando la latencia y disminuyendo el throughput, degradando el rendimiento total del sistema. Por otro lado, las aplicaciones paralelas que corren sobre estas redes contienen etapas representativas durante su ejecución las cuales permiten caracterizarlas, además de un comportamiento repetitivo que puede ser identificado en base a dicha caracterización. Este trabajo presenta el Balanceo Predictivo de Encaminamiento Distribuido (PR-DRB), un nuevo método desarrollado para controlar la congestión en la red en forma gradual; basado en la expansión de caminos, la distribución de tráfico y carga efectiva actual, a fin de mantener una latencia baja. PR-DRB monitorea la latencia de los mensajes en los encaminadores, toma decisiones sobre los caminos alternativos a utilizar y registra la información de la congestión en base al patrón de comunicaciones detectado, usando como concepto base la repetitividad de las aplicaciones para luego volver a aplicar la mejor solución cuando dicho patrón se repita. Experimentos de tráfico con congestión fueron llevados a cabo para evaluar el rendimiento del método, los cuales mostraron la bondad del mismo

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions
    corecore