185 research outputs found

    Progressive congestion management based on packet marking and validation techniques

    Full text link
    © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Congestion management in multistage interconnection networks is a serious problem, which is not solved completely. In order to avoid the degradation of network performance when congestion appears, several congestion management mechanisms have been proposed. Most of these mechanisms are based on explicit congestion notification. For this purpose, switches detect congestion and depending on the applied strategy, packets are marked to warn the source hosts. In response, source hosts apply some corrective actions to adjust their packet injection rate. Although these proposals seem quite effective, they either exhibit some drawbacks or are partial solutions. Some of them introduce some penalties over the flows not responsible for congestion, whereas others can cope only with congestion situations that last for a short time. In this paper, we present an overview of the different strategies to detect and correct congestion in multistage interconnection networks, and propose a new mechanism referred to as Marking and Validation Congestion Management (MVCM), targeted to this kind of lossless networks, and based on a more refined packet marking strategy combined with a fair set of corrective actions, that makes the mechanism able to effectively manage congestion regardless of the congestion degree. Evaluation results show the effectiveness and robustness of the proposed mechanism.This work was supported by the Spanish MEC and MICINN, as well as European Commission FEDER funds, under Grants CSD2006-00046 and TIN2009-14475-C04-01.Ferrer Pérez, JL.; Baydal Cardona, ME.; Robles Martínez, A.; López Rodríguez, PJ.; Duato Marín, JF. (2012). Progressive congestion management based on packet marking and validation techniques. IEEE Transactions on Computers. 61(9):1296-1309. doi:10.1109/TC.2011.146S1296130961

    DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS

    Full text link
    El crecimiento de los computadores paralelos basados en redes de altas prestaciones ha aumentado el interés y esfuerzo de la comunidad investigadora en desarrollar nuevas técnicas que permitan obtener el mejor rendimiento de estas redes. En particular, el desarrollo de nuevas técnicas que permitan un encaminamiento eficiente y que reduzcan la latencia de los paquetes, aumentando así la productividad de la red. Sin embargo, una alta tasa de utilización de la red podría conllevar el que se conoce como "congestión de red", el cual puede causar una degradación del rendimiento. El control de la congestión en redes multietapa es un problema importante que no está completamente resuelto. Con el fin de evitar la degradación del rendimiento de la red cuando aparece congestión, se han propuesto diferentes mecanismos para el control de la congestión. Muchos de estos mecanismos están basados en notificación explícita de la congestión. Para este propósito, los switches detectan congestión y dependiendo de la estrategia aplicada, los paquetes son marcados con la finalidad de advertir a los nodos origenes. Como respuesta, los nodos origenes aplican acciones correctivas para ajustar su tasa de inyección de paquetes. El propósito de esta tesis es analizar las diferentes estratégias de detección y corrección de la congestión en redes multietapa, y proponer nuevos mecanismos de control de la congestión encaminados a este tipo de redes sin descarte de paquetes. Las nuevas propuestas están basadas en una estrategia más refinada de marcaje de paquetes en combinación con un conjunto de acciones correctivas justas que harán al mecanismo capaz de controlar la congestión de manera efectiva con independencia del grado de congestión y de las condiciones de tráfico.Ferrer Pérez, JL. (2012). DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18197Palanci

    Adaptive Response System for Distributed Denial-of-Service Attacks

    No full text
    The continued prevalence and severe damaging effects of the Distributed Denial of Service (DDoS) attacks in today’s Internet raise growing security concerns and call for an immediate response to come up with better solutions to tackle DDoS attacks. The current DDoS prevention mechanisms are usually inflexible and determined attackers with knowledge of these mechanisms, could work around them. Most existing detection and response mechanisms are standalone systems which do not rely on adaptive updates to mitigate attacks. As different responses vary in their “leniency” in treating detected attack traffic, there is a need for an Adaptive Response System. We designed and implemented our DDoS Adaptive ResponsE (DARE) System, which is a distributed DDoS mitigation system capable of executing appropriate detection and mitigation responses automatically and adaptively according to the attacks. It supports easy integrations for both signature-based and anomaly-based detection modules. Additionally, the design of DARE’s individual components takes into consideration the strengths and weaknesses of existing defence mechanisms, and the characteristics and possible future mutations of DDoS attacks. These components consist of an Enhanced TCP SYN Attack Detector and Bloom-based Filter, a DDoS Flooding Attack Detector and Flow Identifier, and a Non Intrusive IP Traceback mechanism. The components work together interactively to adapt the detections and responses in accordance to the attack types. Experiments conducted on DARE show that the attack detection and mitigation are successfully completed within seconds, with about 60% to 86% of the attack traffic being dropped, while availability for legitimate and new legitimate requests is maintained. DARE is able to detect and trigger appropriate responses in accordance to the attacks being launched with high accuracy, effectiveness and efficiency. We also designed and implemented a Traffic Redirection Attack Protection System (TRAPS), a stand-alone DDoS attack detection and mitigation system for IPv6 networks. In TRAPS, the victim under attack verifies the authenticity of the source by performing virtual relocations to differentiate the legitimate traffic from the attack traffic. TRAPS requires minimal deployment effort and does not require modifications to the Internet infrastructure due to its incorporation of the Mobile IPv6 protocol. Experiments to test the feasibility of TRAPS were carried out in a testbed environment to verify that it would work with the existing Mobile IPv6 implementation. It was observed that the operations of each module were functioning correctly and TRAPS was able to successfully mitigate an attack launched with spoofed source IP addresses

    Methodologies for the analysis of value from delay-tolerant inter-satellite networking

    Get PDF
    In a world that is becoming increasingly connected, both in the sense of people and devices, it is of no surprise that users of the data enabled by satellites are exploring the potential brought about from a more connected Earth orbit environment. Lower data latency, higher revisit rates and higher volumes of information are the order of the day, and inter-connectivity is one of the ways in which this could be achieved. Within this dissertation, three main topics are investigated and built upon. First, the process of routing data through intermittently connected delay-tolerant networks is examined and a new routing protocol introduced, called Spae. The consideration of downstream resource limitations forms the heart of this novel approach which is shown to provide improvements in data routing that closely match that of a theoretically optimal scheme. Next, the value of inter-satellite networking is derived in such a way that removes the difficult task of costing the enabling inter-satellite link technology. Instead, value is defined as the price one should be willing to pay for the technology while retaining a mission value greater than its non-networking counterpart. This is achieved through the use of multi-attribute utility theory, trade-space analysis and system modelling, and demonstrated in two case studies. Finally, the effects of uncertainty in the form of sub-system failure are considered. Inter-satellite networking is shown to increase a system's resilience to failure through introduction of additional, partially failed states, made possible by data relay. The lifetime value of a system is then captured using a semi-analytical approach exploiting Markov chains, validated with a numerical Monte Carlo simulation approach. It is evident that while inter-satellite networking may offer more value in general, it does not necessarily result in a decrease in the loss of utility over the lifetime.In a world that is becoming increasingly connected, both in the sense of people and devices, it is of no surprise that users of the data enabled by satellites are exploring the potential brought about from a more connected Earth orbit environment. Lower data latency, higher revisit rates and higher volumes of information are the order of the day, and inter-connectivity is one of the ways in which this could be achieved. Within this dissertation, three main topics are investigated and built upon. First, the process of routing data through intermittently connected delay-tolerant networks is examined and a new routing protocol introduced, called Spae. The consideration of downstream resource limitations forms the heart of this novel approach which is shown to provide improvements in data routing that closely match that of a theoretically optimal scheme. Next, the value of inter-satellite networking is derived in such a way that removes the difficult task of costing the enabling inter-satellite link technology. Instead, value is defined as the price one should be willing to pay for the technology while retaining a mission value greater than its non-networking counterpart. This is achieved through the use of multi-attribute utility theory, trade-space analysis and system modelling, and demonstrated in two case studies. Finally, the effects of uncertainty in the form of sub-system failure are considered. Inter-satellite networking is shown to increase a system's resilience to failure through introduction of additional, partially failed states, made possible by data relay. The lifetime value of a system is then captured using a semi-analytical approach exploiting Markov chains, validated with a numerical Monte Carlo simulation approach. It is evident that while inter-satellite networking may offer more value in general, it does not necessarily result in a decrease in the loss of utility over the lifetime

    Predictive and distributed routing balancing (PR-DRB) : high speed interconnection networks

    Get PDF
    Current parallel applications running on clusters require the use of an interconnection network to perform communications among all computing nodes available. Imbalance of communications can produce network congestion, reducing throughput and increasing latency, degrading the overall system performance. On the other hand, parallel applications running on these networks posses representative stages which allow their characterization, as well as repetitive behavior that can be identified on the basis of this characterization. This work presents the Predictive and Distributed Routing Balancing (PR-DRB), a new method developed to gradually control network congestion, based on paths expansion, traffic distribution and effective traffic load, in order to maintain low latency values. PR-DRB monitors messages latencies on intermediate routers, makes decisions about alternative paths and record communication pattern information encountered during congestion situation. Based on the concept of applications repetitiveness, best solution recorded are reapplied when saved communication pattern re-appears. Traffic congestion experiments were conducted in order to evaluate the performance of the method, and improvements were observed.Les aplicacions paral·leles actuals en els Clústers requereixen l'ús d'una xarxa d'interconnexió per comunicar a tots els nodes de còmput disponibles. El desequilibri en la càrrega de comunicacions pot congestionar la xarxa, incrementant la latència i disminuint el throughput, degradant el rendiment total del sistema. D'altra banda, les aplicacions paral·leles que s'executen sobre aquestes xarxes contenen etapes representatives durant la seva execució les quals permeten caracteritzar-les, a més d'extraure un comportament repetitiu que pot ser identificat en base a aquesta caracterització. Aquest treball presenta el Balanceig Predictiu de Encaminament Distribuït (PR-DRB), un nou mètode desenvolupat per controlar la congestió a la xarxa en forma gradual, basat en l'expansió de camins, la distribució de trànsit i càrrega efectiva actual per tal de mantenir una latència baixa. PR-DRB monitoritza la latència dels missatges en els encaminadors, pren decisions sobre els camins alternatius a utilitzar i registra la informació de la congestió sobre la base del patró de comunicacions detectat, utilitzant com a concepte base la repetitivitat de les aplicacions per després tornar a aplicar la millor solució quan aquest patró es repeteixi. Experiments de trànsit amb congestió van ser portats a terme per avaluar el rendiment del mètode, els quals van mostrar la bondat del mateix.Las aplicaciones paralelas actuales en los Clústeres requieren el uso de una red de interconexión para comunicar a todos los nodos de cómputo disponibles. El desbalance en la carga de comunicaciones puede congestionar la red, incrementando la latencia y disminuyendo el throughput, degradando el rendimiento total del sistema. Por otro lado, las aplicaciones paralelas que corren sobre estas redes contienen etapas representativas durante su ejecución las cuales permiten caracterizarlas, además de un comportamiento repetitivo que puede ser identificado en base a dicha caracterización. Este trabajo presenta el Balanceo Predictivo de Encaminamiento Distribuido (PR-DRB), un nuevo método desarrollado para controlar la congestión en la red en forma gradual; basado en la expansión de caminos, la distribución de tráfico y carga efectiva actual, a fin de mantener una latencia baja. PR-DRB monitorea la latencia de los mensajes en los encaminadores, toma decisiones sobre los caminos alternativos a utilizar y registra la información de la congestión en base al patrón de comunicaciones detectado, usando como concepto base la repetitividad de las aplicaciones para luego volver a aplicar la mejor solución cuando dicho patrón se repita. Experimentos de tráfico con congestión fueron llevados a cabo para evaluar el rendimiento del método, los cuales mostraron la bondad del mismo

    Congestion control, energy efficiency and virtual machine placement for data centers

    Get PDF
    Data centers, facilities with communications network equipment and servers for data processing and/or storage, are prevalent and essential to provide a myriad of services and applications for various private, non-profit, and government systems, and they also form the foundation of cloud computing, which is transforming the technological landscape of the Internet. With rapid deployment of modern high-speed low-latency large-scale data centers, many issues have emerged in data centers, such as data center architecture design, congestion control, energy efficiency, virtual machine placement, and load balancing. The objective of this thesis is multi-fold. First, an enhanced Quantized Congestion Notification (QCN) congestion notification algorithm, called fair QCN (FQCN), is proposed to improve rate allocation fairness of multiple flows sharing one bottleneck link in data center networks. Detailed analysis on FQCN and simulation results is provided to validate the fair share rate allocation while maintaining the queue length stability. Furthermore, the effects of congestion notification algorithms, including QCN, AF-QCN and FQCN, are investigated with respect to TCP throughput collapse. The results show that FQCN can significantly enhance TCP throughput performance, and achieve better TCP throughput than QCN and AF-QCN in a TCP Incast setting. Second, a unified congestion detection, notification and control system for data center networks is designed to efficiently resolve network congestion in a uniform solution and to ensure convergence to statistical fairness with “no state” switches simultaneously. The architecture of the proposed system is described in detail and the FQCN algorithm is implemented in the proposed framework. The simulation results of the FQCN algorithm implemented in the proposed framework validate the robustness and efficiency of the proposed congestion control system. Third, a two-level power optimization model, namely, Hierarchical EneRgy Optimization (HERO), is established to reduce the power consumption of data center networks by switching off network switches and links while still guaranteeing full connectivity and maximizing link utilization. The power-saving performance of the proposed HERO model is evaluated by simulations with different traffic patterns. The simulation results have shown that HERO can reduce the power consumption of data center networks effectively with reduced complexity. Last, several heterogeneity aware dominant resource assistant heuristic algorithms, namely, dominant residual resource aware first-fit decreasing (DRR-FFD), individual DRR-FFD (iDRR-FFD) and dominant residual resource based bin fill (DRR-BinFill), are proposed for virtual machine (VM) consolidation. The proposed heuristic algorithms exploit the heterogeneity of the VMs’ requirements for different resources by capturing the differences among VMs’ demands, and the heterogeneity of the physical machines’ resource capacities by capturing the differences among physical machines’ resources. The performance of the proposed heuristic algorithms is evaluated with different classes of synthetic workloads under different VM requirement heterogeneity conditions, and the simulation results demonstrate that the proposed heuristics achieve quite similar consolidation performance as dimension-aware heuristics with almost the same computational cost as those of the single dimensional heuristics

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00
    corecore