9 research outputs found

    Online detection of pathological TCP flows with retransmissions in high-speed networks

    Get PDF
    Online Quality of Service (QoS) assessment in high speed networks is one of the key concerns for service providers, namely to detect QoS degradation on-the-fly as soon as possible and avoid customers’ complaints. In this regard, a Key Performance Indicator (KPI) is the number of TCP retransmissions per flow, which is related to packet losses or increased network and/or client/server latency. However, to accurately detect TCP retransmissions the whole sequence number list should be tracked which is a challenging task in multi-Gb/s networks. In this paper we show that the simplest approach of counting as a retransmission a packet whose sequence number is smaller than the previous one is enough to detect pathological flows with severe retransmissions. Such a lightweight approach eliminates the need of tracking the whole TCP flow history, which severely restricts traffic analysis throughput. Our findings show that low False Positive Rates (FPR) and False Negative Rates (FNR) can be achieved in the detection of such pathological flows with severe retransmissions, which are of paramount importance for QoS monitoring. Most importantly, we show that live detection of such pathological flows at 10 Gb/s rate per processing core is feasibleThis work has been partially funded by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund under the projects TRÁFICA (MINECO/ FEDER TEC2015-69417-C2-1-R), Preproceso Inteligente de Tráfico (MINECO / FEDER TEC2015-69417-C2-2-R) and RACING DRONES (MINECO / FEDER RTC-2016-4744-7

    EFFICIENT IMPLEMENTATIONS OF NETWORKING DEVICES ON MULTICORE HARDWARE PLATFORMS

    Full text link
    V doktorski disertaciji obravnavamo izboljšave večjedrnih omrežnih naprav na področju zmogljivosti in učinkovitosti. Glavni motiv za izboljšave so stalno prisotne zahteve po hitrih in zanesljivih povezavah med elektronskimi napravami vseh tipov. Te naprave omogočajo uporabnikom številne omrežne storitve, pri katerih je zahtevana neprekinjena povezljivost. Omrežja, ki to povezljivost omogočajo, vzpostavljajo in vzdržujejo omrežni operaterji, ki potrebujejo vedno bolj učinkovite izvedbe omrežnih naprav, s katerimi lahko zagotavljajo visoko kakovostne in stroškovno učinkovite omrežne storitve. Visoke zmogljivosti in energijsko učinkovitost je mogoče doseči z uporabo večjedrnih računalniških platform, ki v omrežjih pospešeno zamenjujejo enojedrne platforme. Uporaba večjedrnih platform sama po sebi ne prinese povečanja zmogljivosti, če jim programska oprema ni prilagojena, tako da lahko ustrezno izkorišča razpoložljive sistemske vire. Postopki obdelave omrežnega prometa v omrežnih napravah morajo biti izvedeni vzporedno, kar lahko dosežemo na dva načina: a) z uravnoteženim razporejanjem omrežnega prometa na razpoložljiva procesorska jedra in b) z vzporedno izvedbo omrežnih funkcionalnosti, ki jih nudi omrežna naprava. Za osnovi cilj doktorskega dela smo si zadali razvoj in ovrednotenje dveh inovativnih izboljšav za povečanje zmogljivosti in učinkovitosti večjedrnih omrežnih naprav. Prvo izboljšavo predstavlja adaptivna metoda razporejanja omrežnega prometa, ki smo jo zasnovali kot kombinacijo razporejanja omrežnega prometa na podlagi omrežnih paketov in na podlagi tokov. Pri tem je vsakemu procesorskemu jedru dodeljena izbrana količina žetonov, ki določajo koliko omrežnih paketov sme določeno jedro obdelati. Vsak obravnavan paket zmanjša število žetonov za enega. Žetoni se periodično prerazporejajo glede na povprečno obremenjenost procesorskih jeder, s čimer je breme obdelave omrežnega prometa uravnoteženo razporejeno med razpoložljiva jedra. Če omrežni paket sprejme jedro, ki nima več na voljo nobenega žetona, ga posreduje najbližjemu sosednjemu jedru, ki ima dostop do skupnega predpomnilnika, s čimer se minimalno poveča čas medjedrne komunikacije. Opisano metodo razporejanja omrežnega prometa smo vključili v omrežno napravo Linux Bridge in jo v preizkusnem okolju ovrednotili z dvema scenarijema. Pri prvem scenariju smo posnemali najmanj ugodne pogoje delovanja, pri katerih je poudarek na enem prevladujočem podatkovnem toku v omrežnem prometu, ki je vseboval veliko količino omrežnih paketov. Pri drugem scenariju pa smo posnemali hrbtenično internetno povezavo, pri čemer je mrežni promet sestavljen iz večjega števila tokov s približno enakomerno porazdelitvijo omrežnih paketov. V prvem primeru se zmogljivost, ki jo merimo s prepustnostjo, z uporabo predlagane izboljšave poveča za 2,8-krat, pri čemer smo uporabili štiri procesorska jedra. V drugem primeru pa zmogljivost ostane na približno enakem nivoju, kot pri razporejanju omrežnega prometa na podlagi tokov, ki je privzet način obravnave omrežnih paketov. Drugo izboljšavo predstavlja vzporedna izvedba šifriranja omrežnega prometa, s katerim sta zagotovljeni varnost in zasebnost omrežnih povezav. Pri tem smo uporabili kombinacijo postopkovne in podatkovne dekompozicije, s katero smo izvedbo pogostih šifrirnih algoritmov razdelili na več opravil, ki lahko tečejo sočasno. Ta opravila se morajo medsebojno usklajevati, kar med njimi zahteva dodatno komunikacijo, ki lahko zmanjša učinkovitost vzporedne izvedbe. Dodatna komunikacija ima sicer pri šifrirnih algoritmih, ki so večinoma računsko nezahtevni, še posebej velik učinek, saj lahko relativno malo komuniciranja hitro izniči pohitritve, ki so dosežene z vzporedno izvedbo. V okviru izboljšave smo zato dodatno optimizirali komunikacijo med opravili z uporabo atomarnih spremenljivk in vhodno-izhodnih vrst, ki ne potrebujejo zaklepanja kritičnih delov program. Vsa opravila smo dodelili sosednjim jedrom z dostopom do skupnega predpomnilnika in tako povečali hitrost medjedrne komunikacije. Rezultati vrednotenja prikazujejo dosežene pohitritve na računalniku z dvanajstimi jedri: 1,9 pri AES in 6,3 pri 3DES ter 7,6 pri RSA šifrirnem algoritmu. Poleg dveh opisanih izboljšav je v disertaciji predstavljena tudi metodologija za sistematično vrednotenje zmogljivosti in učinkovitosti večjedrnih omrežnih naprav. Definirali smo ključne kriterije za ovrednotenje zmogljivosti in učinkovitosti, ki poleg standardnih meril za zmogljivost in kakovost zagotavljanja storitev, vsebujejo tudi merila za merjenje izkoriščenosti sistemskih virov kot npr. obremenjenost procesorskih jeder, delež zadetkov predpomnilnika, pohitritve in učinkovitost vzporednih izvedb. Opisani so tudi postopki sistematičnega vrednotenja učinkovitosti omrežnih naprav, ki vključujejo izgradnjo preizkusnega okolja, pripravo preizkusnih orodij, izvedbo preizkusnih postopkov in analizo rezultatov. Predstavljena metodologija je bila uporabljena za izvedbo primerjalnih meritev med različnimi izvedbami omrežnih naprav s poudarkom na primerjavi tradicionalnih strojno definiranih in hitro razvijajočih se programsko definiranih omrežnih napravah, ki tečejo na splošno dostopnih računalniških platformah. Primerjava je pokazala, da so strojno definirane omrežne naprave bistveno bolj zmogljive in tudi energijsko učinkovitejše v primerjavi s programsko definiranimi napravami. Le-te pa so zaradi veliko večje fleksibilnosti veliko bolj stroškovno učinkovite, saj je njihov razvoj enostavnejši in hitrejši, hkrati pa lahko pouporabljajo obstoječo računalniško opremo. Vanje je tudi enostavnejše vključevati inovativne izboljšave, kot npr. metode predstavljene v tem delu, hkrati pa se lažje vključujejo v sodobne koncepte omreženja, kot npr. virtualizacija omrežnih funkcionalnosti.The dissertation is focused in the performance and efficiency improvements of multi-core based networking devices. The main motivation for the improvements presented in this work are constantly growing demands for fast and reliable network communications between all types of electronic devices. These devices offer to users an ever increasing number of network-based services, which require a continuous connectivity to computer networks. Service providers, which must maintain these large networks, demand efficient networking devices, so they can provide cost-effective high-quality services. In order to meet these demands, single-core based networking devices are being replaced by multi-core based devices, which are able to offer significantly more performance and at the same time consume less power. The usage of multi-core networking devices itself, however, does not improve the performance, unless devices are modified in a way that all available hardware resources are utilized. Therefore, to make multi-core networking devices efficient, the process of handling the network traffic must be parallelized, which can be achieved by two different concepts: a) by a distribution of the network traffic among available processor cores and b) by a parallel implementation of offered network functionalities. The main goal of our work is to develop and evaluate two innovative improvements of networking devices, implemented on multi-core architectures, which can increase the performance and efficiency of networking. The first improvement is in development and implementation of an adaptive network-traffic-distribution method, which is a combination of packet-based and flow-based traffic distributions. In this method each core is assigned a specific amount of tokens, which represent the number of network packets that the core is allowed to process. Each processed packet consumes one token. The tokens are redistributed periodically according to the average core load, so the load of packet processing is balanced among available cores. If a core runs out of tokens, it assigns the packet to the nearest adjacent core, possibly operating on the shared cache memory, which minimizes the time of inter-core communications. We experimentally validated the method by integrating it in to the Linux Bridge and performed the tests with the “worst case” scenario, with one dominant flow, and the “backbone-link” scenario, with a large number of flows that have a similar packet rate. In the first case, the performance in traffic throughput is improved by a factor of 2.8 by utilizing four processing cores. In the second case with a large amount of traffic flows, the performance remains similar to the existing state-of-the-art flow-based methods. The second improvement is a parallelization of the network-traffic encryption process, which is used to ensure the safety and privacy of network communications. We combined functional and data decompositions to create many tasks in common encryption algorithm implementations that can run in parallel. These tasks, however, must be synchronized, which reduces the efficiency of the parallelization. Because encryption algorithms have low computational complexity, even low synchronization overhead can nullify the improvements of the parallelization. We therefore minimized the time of inter-task communication by assigning the tasks to adjacent cores with common cache memory and by using atomic variables and lock-free queues for network packet storage. The results of the verification show, that we achieve, on a computer with twelve cores, speedups of 1.9, 6.3 and 7.6 with encryption algorithms AES, 3DES and RSA, respectively. In addition to the two presented methods we also defined a methodology for systematic efficiency evaluation of multi-core based networking devices. We defined key criteria that include standard performance and quality-of-service metrics as well as other indicators, which evaluated the utilization of system resources e.g., core load, cache hit ratio, speedup and parallel efficiency. We described steps required to perform the systematic evaluation, which include establishing a testing environment, preparing testing tools, conducting testing procedures defined according to evaluation criteria, and analyzing the results. The established testing methodology was used to compare different implementations of networking devices with the focus on the comparison of traditional hardware-defined networking devices with the emerging software-defined networking devices, which are implemented entirely in software and run on the commercial-of-the-shelf hardware. The results have shown, that hardware defined networking devices achieve more performance and are also significantly more energy efficient than software-defined devices. The latter are on the other hand much more flexible, which results in a simple and cost effective development. Due to their flexibility, the previously described performance-improvement methods can be more easily embedded in the software-defined devices. Additionally, they can be easily used in contemporary networking concepts such as the network functions virtualization

    Non-minimal adaptive routing for efficient interconnection networks

    Get PDF
    RESUMEN: La red de interconexión es un concepto clave de los sistemas de computación paralelos. El primer aspecto que define una red de interconexión es su topología. Habitualmente, las redes escalables y eficientes en términos de coste y consumo energético tienen bajo diámetro y se basan en topologías que encaran el límite de Moore y en las que no hay diversidad de caminos mínimos. Una vez definida la topología, quedando implícitamente definidos los límites de rendimiento de la red, es necesario diseñar un algoritmo de enrutamiento que se acerque lo máximo posible a esos límites y debido a la ausencia de caminos mínimos, este además debe explotar los caminos no mínimos cuando el tráfico es adverso. Estos algoritmos de enrutamiento habitualmente seleccionan entre rutas mínimas y no mínimas en base a las condiciones de la red. Las rutas no mínimas habitualmente se basan en el algoritmo de balanceo de carga propuesto por Valiant, esto implica que doblan la longitud de las rutas mínimas y por lo tanto, la latencia soportada por los paquetes se incrementa. En cuanto a la tecnología, desde su introducción en entornos HPC a principios de los años 2000, Ethernet ha sido usado en un porcentaje representativo de los sistemas. Esta tesis introduce una implementación realista y competitiva de una red escalable y sin pérdidas basada en dispositivos de red Ethernet commodity, considerando topologías de bajo diámetro y bajo consumo energético y logrando un ahorro energético de hasta un 54%. Además, propone un enrutamiento sobre la citada arquitectura, en adelante QCN-Switch, el cual selecciona entre rutas mínimas y no mínimas basado en notificaciones de congestión explícitas. Una vez implementada la decisión de enrutar siguiendo rutas no mínimas, se introduce un enrutamiento adaptativo en fuente capaz de adaptar el número de saltos en las rutas no mínimas. Este enrutamiento, en adelante ACOR, es agnóstico de la topología y mejora la latencia en hasta un 28%. Finalmente, se introduce un enrutamiento dependiente de la topología, en adelante LIAN, que optimiza el número de saltos de las rutas no mínimas basado en las condiciones de la red. Los resultados de su evaluación muestran que obtiene una latencia cuasi óptima y mejora el rendimiento de algoritmos de enrutamiento actuales reduciendo la latencia en hasta un 30% y obteniendo un rendimiento estable y equitativo.ABSTRACT: Interconnection network is a key concept of any parallel computing system. The first aspect to define an interconnection network is its topology. Typically, power and cost-efficient scalable networks with low diameter rely on topologies that approach the Moore bound in which there is no minimal path diversity. Once the topology is defined, the performance bounds of the network are determined consequently, so a suitable routing algorithm should be designed to accomplish as much as possible of those limits and, due to the lack of minimal path diversity, it must exploit non-minimal paths when the traffic pattern is adversarial. These routing algorithms usually select between minimal and non-minimal paths based on the network conditions, where the non-minimal paths are built according to Valiant load-balancing algorithm. This implies that these paths double the length of minimal ones and then the latency supported by packets increases. Regarding the technology, from its introduction in HPC systems in the early 2000s, Ethernet has been used in a significant fraction of the systems. This dissertation introduces a realistic and competitive implementation of a scalable lossless Ethernet network for HPC environments considering low-diameter and low-power topologies. This allows for up to 54% power savings. Furthermore, it proposes a routing upon the cited architecture, hereon QCN-Switch, which selects between minimal and non-minimal paths per packet based on explicit congestion notifications instead of credits. Once the miss-routing decision is implemented, it introduces two mechanisms regarding the selection of the intermediate switch to develop a source adaptive routing algorithm capable of adapting the number of hops in the non-minimal paths. This routing, hereon ACOR, is topology-agnostic and improves average latency in all cases up to 28%. Finally, a topology-dependent routing, hereon LIAN, is introduced to optimize the number of hops in the non-minimal paths based on the network live conditions. Evaluations show that LIAN obtains almost-optimal latency and outperforms state-of-the-art adaptive routing algorithms, reducing latency by up to 30.0% and providing stable throughput and fairness.This work has been supported by the Spanish Ministry of Education, Culture and Sports under grant FPU14/02253, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2010-21291-C02-02, TIN2013-46957-C2-2-P, and TIN2013-46957-C2-2-P (AEI/FEDER, UE), the Spanish Research Agency under contract PID2019-105660RBC22/AEI/10.13039/501100011033, the European Union under agreements FP7-ICT-2011- 7-288777 (Mont-Blanc 1) and FP7-ICT-2013-10-610402 (Mont-Blanc 2), the University of Cantabria under project PAR.30.P072.64004, and by the European HiPEAC Network of Excellence through an internship grant supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. H2020-ICT-2015-687689

    Congestion control algorithms of TCP in emerging networks

    Get PDF
    In this dissertation we examine some of the challenges faced by the congestion control algorithms of TCP in emerging networks. We focus on three main issues. First, we propose TCP with delayed congestion response (TCP-DCR), for improving performance in the presence of non-congestion events. TCP-DCR delays the conges- tion response for a short interval of time, allowing local recovery mechanisms to handle the event, if possible. If at the end of the delay, the event persists, it is treated as congestion loss. We evaluate TCP-DCR through analysis and simulations. Results show significant performance improvements in the presence of non-congestion events with marginal impact in their absence. TCP-DCR maintains fairness with standard TCP variants that respond immediately. Second, we propose Layered TCP (LTCP), which modifies a TCP flow to behave as a collection of virtual flows (or layers), to improve eficiency in high-speed networks. The number of layers is determined by dynamic network conditions. Convergence properties and RTT-unfairness are maintained similar to that of TCP. We provide the intuition and the design for the LTCP protocol and evaluation results based on both simulations and Linux implementation. Results show that LTCP is about an order of magnitude faster than TCP in utilizing high bandwidth links while maintaining promising convergence properties. Third, we study the feasibility of employing congestion avoidance algorithms in TCP. We show that end-host based congestion prediction is more accurate than previously characterized. However, uncertainties in congestion prediction may be un- avoidable. To address these uncertainties, we propose an end-host based mechanism called Probabilistic Early Response TCP (PERT). PERT emulates the probabilistic response function of the router-based scheme RED/ECN in the congestion response function of the end-host. We show through extensive simulations that, similar to router-based RED/ECN, PERT provides fair bandwidth sharing with low queuing delays and negligible packet losses, without requiring the router support. It exhibits better characteristics than TCP-Vegas, the illustrative end-host scheme. PERT can also be used for emulating other router schemes. We illustrate this through prelim- inary results for emulating the router-based mechanism REM/ECN. Finally, we show the interactions and benefits of combining the different proposed mechanisms

    Improving latency for interactive, thin-stream applications over reliable transport

    Get PDF
    A large number of network services use IP and reliable transport protocols. For applications with constant pressure of data, loss is handled satisfactorily, even if the application is latencysensitive. For applications with data streams consisting of intermittently sent small packets, users experience extreme latencies more frequently. Due to the fact that such thin-stream applications are commonly interactive and time-dependent, increased delay may severely reduce the experienced quality of the application. When TCP is used for thin-stream applications, events of highly increased latency are common, caused by the way retransmissions are handled. Other transport protocols that are deployed in the Internet, like SCTP, model their congestion control and reliability on TCP, as do many frameworks that provide reliability on top of unreliable transport. We have tested several application- and transport layer solutions, and based on our findings, we propose sender-side enhancements that reduce the application-layer latency in a manner that is compatible with unmodified receivers. We have implemented the mechanisms as modifications to the Linux kernel, both for TCP and SCTP. The mechanisms are dynamically triggered so that they are only active when the kernel identifies the stream as thin. To evaluate the performance of our modifications, we have conducted a wide range of experiments using replayed thin-stream traces captured from real applications as well as artificially generated thin-stream data patterns. From the experiments, effects on latency, redundancy and fairness were evaluated. The analysis of the performed experiments shows great improvements in latency for thin streams when applying the modifications. Surveys where users evaluate their experience of several applications’ quality using the modified transport mechanisms confirmed the improvements seen in the statistical analysis. The positive effects of our modifications were shown to be possible without notable effects on fairness for competing streams. We therefore conclude that it is advisable to handle thin streams separately, using our modifications, when transmitting over reliable protocols to reduce retransmission latency
    corecore