888 research outputs found

    Steroid OpenFlow Service Scalability Analysis

    Get PDF
    Modern cloud applications are hosted on data centers across vast geographical scopes and exchange large amounts of data continuously. Transmission Control Protocol (TCP) is the most popular protocol for reliable data transfer; however, due to TCP’s congestion control mechanism, maximum achievable throughput across a large bandwidth-delay product (BDP) network is limited. Various solutions exist to enhance data transfer throughput but they usually require non-trivial and explicit installation and tuning of specialized software on both sides which makes deployment limited. A software defined networking (SDN) based solution Steroid OpenFlow Service (SOS) was developed that utilizes multiple parallel TCP connections to transparently enhance network performance across a large BDP network. OpenFlow is used to transparently redirect user traffic to nearby service machines called SOS agent and these agents use multiple TCP connections to transfer data fast across large BDP network. While SOS has shown significant improvements in data transfer throughput, there are multiple factors which affect its performance. This study focuses on SOS scalability analysis targeting four critical factors: CPU utilization of SOS agents, sockets used for parallel TCP connections, how OpenFlow is used and network configurations. Through this study, the SOS agent code was revamped for performance improvements. Experiments were conducted on the National Science Foundation’s CloudLab platform to assess the effect of the above-mentioned factors on SOS performance. Results have shown improvement in throughput per SOS session from 10.96Gbps to 12.82Gbps by removing CPU bottleneck on 25Gbps network. SOS deployment over an InfiniBand network has shown a linear increase in throughput to 23.22Gbps with optimal network configurations. Using OpenFlow to support multiple client connections to the same server have increased throughput from 12.17Gbps to 17.20Gbps. The study showed that with code-level improvements and optimal network configurations, SOS performance can be improved substantially

    InfiniBand-Based Mechanism to Enhance Multipath QoS in MANETs

    Get PDF
    Mobile Ad-hoc Networks (MANETs), the continuous changes in topology and the big amounts of data exchanged across the network makes it difficult for a single routing algorithm to route data efficiently between nodes. MANETs usually suffer from high packet loss rates and high link failure rates, which also makes it difficult to exchange data in effective and reliable fashion. These challenges usually increase congestion on some links while other links are almost free. In this thesis, we propose a novel mechanism to enhance QoS in multipath routing protocols in MANETs based on the InfiniBand (IB) QoS architecture. The basic idea of our approach is to enhance the path balancing to reduce congestion on overloaded links. This mechanism has enabled us to give critical applications higher priority to send them packet when routing their packets across the network, effectively manage frequent connections and disconnections and thus help reduce link failures and packet loss rates, and reduce the overall power consumption as a consequence of the previous gains. We have tested the scheme on the (IBMGTSim) simulator and achieved significant improvements in QoS parameters compared to two well-known routing protocols: AODV and AOMDV.هناك نوع من الشبكات حيث يكون كل المكونات فيها عبارة عن اجهزة متحركة بدون اي بنية تحتية تسمى "MANET "في هذا النوع من الشبكات تتعاون االجهزة ذاتيا لتحديد الطرق في ما بينها والنها متحركة تقوم هذه االجهزة بحساب اكثر من طريق عو ًضا عن حساب طريق واحد لتقليل من احتمالية فشل في االرسال حيث اذا تم فشل في طريق معينة تبقى الطرق االخرة سليمة. وفي ناحية اخرى ولتنوع اهمية البرامج والخدمات التي توفرها هذه االجهزة هناك ما يسمى "بجودى الخدمات Service of Quality" حيث يقوم المستخدم بوضع اولويات للبرامج والخدمات من استهالك المصادر المتاحة, والطريق الشائعة هي ان يقوم المستخدم بوضع حدود على سرعة استعمال الشبكة من قبل البرامج االقل اهمية لترك المصادر متاحة للبرامج الاكثر المهمة بشكل اكثر وهذا الحل يحتوي على الكثير من المشاكل في هذا النوع من الشبكات, حيث ان مواصفات الطرق غير معروفة وغير ثابتة وقد تحتوي او تتغير الى قيم اقل من الحدود الموضوعة للبرمج الغير مهمة فتتساوى البرامج والخدمات االقل اهمية بالبرامج االكثر اهمية مما يعني فشل في جودة الخدمات. من خالل بحثنا عن حلول ودراسة انواع مختلفة من الشبكات وجدنا نوع من تطبيق جودة الخدمات في نوع الشبكات المسمى بInfiniBand حيث يتم تطبيق جودة الخدمات من خالل تغيير عدد الرسال المبعثة من قبل البرامج, حيث تقوم االجهزة بارسال عدد اكبر من الرسال التابعة للبرامج المهمة مقارنة بعدد الرسال التابعة للبرامج االقل اهمية, ويتم ذلك باستخدام الصفوف, حيث تصطف الرسال من البرامج المهمة بصف يختلف عن الصف الذي يحتوي على رسال البرامج الغير مهمة. هذا الحل له فائدتان مهمتان االولى انه ال يوثر عالطريقة التقليدية ويمكن ان يستخدم معها والفائدة الثانية انه وبخالف الطريقة التقليدية, الطريقة الجديدة ال تتاثر بصفات الطريق المحسوبة او بتغير صفاتها فنسبة عدد الرسال تكون نفسها مهما اختلفت الطرق و صفاتها, بعد تطبيق هذا النوع وجددنا تحسين في كفائة االرسال تصل الى 18 %في جودة التوصيل و 10 %في سرعة الوصول مع العلم ان جودة الخدمات لم تفشل على غرار الطريقة التقليدية

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Management, Optimization and Evolution of the LHCb Online Network

    Get PDF
    The LHCb experiment is one of the four large particle detectors running at the Large Hadron Collider (LHC) at CERN. It is a forward single-arm spectrometer dedicated to test the Standard Model through precision measurements of Charge-Parity (CP) violation and rare decays in the b quark sector. The LHCb experiment will operate at a luminosity of 2x10^32cm-2s-1, the proton-proton bunch crossings rate will be approximately 10 MHz. To select the interesting events, a two-level trigger scheme is applied: the rst level trigger (L0) and the high level trigger (HLT). The L0 trigger is implemented in custom hardware, while HLT is implemented in software runs on the CPUs of the Event Filter Farm (EFF). The L0 trigger rate is dened at about 1 MHz, and the event size for each event is about 35 kByte. It is a serious challenge to handle the resulting data rate (35 GByte/s). The Online system is a key part of the LHCb experiment, providing all the IT services. It consists of three major components: the Data Acquisition (DAQ) system, the Timing and Fast Control (TFC) system and the Experiment Control System (ECS). To provide the services, two large dedicated networks based on Gigabit Ethernet are deployed: one for DAQ and another one for ECS, which are referred to Online network in general. A large network needs sophisticated monitoring for its successful operation. Commercial network management systems are quite expensive and dicult to integrate into the LHCb ECS. A custom network monitoring system has been implemented based on a Supervisory Control And Data Acquisition (SCADA) system called PVSS which is used by LHCb ECS. It is a homogeneous part of the LHCb ECS. In this thesis, it is demonstrated how a large scale network can be monitored and managed using tools originally made for industrial supervisory control. The thesis is organized as the follows: Chapter 1 gives a brief introduction to LHC and the B physics on LHC, then describes all sub-detectors and the trigger and DAQ system of LHCb from structure to performance. Chapter 2 first introduces the LHCb Online system and the dataflow, then focuses on the Online network design and its optimization. In Chapter 3, the SCADA system PVSS is introduced briefly, then the architecture and implementation of the network monitoring system are described in detail, including the front-end processes, the data communication and the supervisory layer. Chapter 4 first discusses the packet sampling theory and one of the packet sampling mechanisms: sFlow, then demonstrates the applications of sFlow for the network trouble-shooting, the traffic monitoring and the anomaly detection. In Chapter 5, the upgrade of LHC and LHCb is introduced, the possible architecture of DAQ is discussed, and two candidate internetworking technologies (high speed Ethernet and InfniBand) are compared in different aspects for DAQ. Three schemes based on 10 Gigabit Ethernet are presented and studied. Chapter 6 is a general summary of the thesis

    Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks

    Get PDF
    Virtual Ethernet overlay provides a powerful model for realizing virtual distributed and parallel computing systems with strong isolation, portability, and recoverability properties. However, in extremely high throughput and low latency networks, such overlays can suffer from bandwidth and latency limitations, which is of particular concern in HPC environments. Through a careful and quantitative analysis, I iden- tify three core issues limiting performance: delayed and excessive virtual interrupt delivery into guests, copies between host and guest data buffers during encapsulation, and the semantic gap between virtual Ethernet features and underlying physical network features. I propose three novel optimizations in response: optimistic timer- free virtual interrupt injection, zero-copy cut-through data forwarding, and virtual TCP offload. These optimizations improve the latency and bandwidth of the overlay network on 10 Gbps Ethernet and InfiniBand interconnects, resulting in near-native performance for a wide range of microbenchmarks and MPI application benchmarks
    corecore