781 research outputs found

    Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks

    Get PDF
    Virtual Ethernet overlay provides a powerful model for realizing virtual distributed and parallel computing systems with strong isolation, portability, and recoverability properties. However, in extremely high throughput and low latency networks, such overlays can suffer from bandwidth and latency limitations, which is of particular concern in HPC environments. Through a careful and quantitative analysis, I iden- tify three core issues limiting performance: delayed and excessive virtual interrupt delivery into guests, copies between host and guest data buffers during encapsulation, and the semantic gap between virtual Ethernet features and underlying physical network features. I propose three novel optimizations in response: optimistic timer- free virtual interrupt injection, zero-copy cut-through data forwarding, and virtual TCP offload. These optimizations improve the latency and bandwidth of the overlay network on 10 Gbps Ethernet and InfiniBand interconnects, resulting in near-native performance for a wide range of microbenchmarks and MPI application benchmarks

    Cost effective RISC core supporting the large sending offload

    Get PDF
    The Ethernet speed has increased sending and receiving frames from 40 to 100 Gbps after the IEEE P802.3ba released. The industry and academia have focused scaling up the TCP/IP protocol processing for 40-100 Gbps. LSO is a de facto standard, which is offloaded to network interface for sending packets up to 10 Gbps. It not clears whether a network interface can support such function for new 40-100 Gbps. The widely use of the hardware-based NIC such as the use of a fully customized logic based network interface can be due to the following reasons; Still it is not clear whether the General Purpose Processor (GPP) can provide the processing required for high-speed line beyond the 10 Gbps. Also, the limit of the GPP's clock in supporting the processing of network interfaces. However, using a RISC core engine for offloading the LSO function can deliver some important features to network interfaces design, such as simplicity, scalability, shorter developing cycle time. In this paper, we have investigated using a specialized RISC core to process the LSO functions for TCP/IP and UDP/IP for high-speed communications rate up to 100 Gbps. To achieve this, we have enhanced the LSO algorithm to scale it to 100 Gbps. A fast DMA is used to support transferring data in the network interface. The LSO processing methodology on the network has presented. In addition, the RISC's performance and data movements for high communication rate up to 100 Gbps have been measured. A 148 MHz RISC core can support the sending-side processing for up to 100 Gbps transmission speed for the TCP/IP and UDP/IP protocol when the MTU is applied (1500 bytes). A DMA with 3759 MHz is required to eliminate the idle cycles while transferring data over the 64-bit local bus

    EbbRT: Elastic Building Block Runtime - overview

    Full text link
    EbbRT provides a lightweight runtime that enables the construction of reusable, low-level system software which can integrate with existing, general purpose systems. It achieves this by providing a library that can be linked into a process on an existing OS, and as a small library OS that can be booted directly on an IaaS node

    Design of a scalable network interface to support enhanced TCP and UDP processing for high speed networks

    Get PDF
    Communication networks have advanced rapidly in providing additional services, with improvements made to their bandwidth and the integration of advanced technology. As the speed of networks exceeds 10 Gbps, the time frame for completing the processing of TCP and UDP packets has become extremely short. The design and implementation of high performance Network Interfaces (NIs) that can support offload protocol functions for current and next-generation networks is challenging. In this thesis two software approaches are presented to enhance protocol processing of TCP and UDP in the network interface. A novel software Large Receive Offload (LRO) approach for enhancing the receiving side has been proposed. The LRO works by aggregating the incoming TCP and UDP packets into larger packets inside the NI’s buffer. The receiving side software has been improved to support out-of-order packets. The second proposed software solution is applied on the Large Send Offload (LSO). The proposed LSO function processing is implemented by segmenting TCP and UDP messages that are larger than the Maximum Transmission Unit to the Maximum Segment Size. New packet headers are generated for each new outgoing packet. A scalable programmable NI based 32-bit RISC core is presented that can support 100 Gbps network speeds. Acceleration of the processing time frame required at the NI has been implemented to prevent hazards (such as Data Hazard and Control Hazard) during the execution of the LRO and the LSO functions. An R2000/3000 RISC has been used in order to test the LRO and LSO functions and to discover the instruction set that is most suitable. Following this the VHDL NI was implemented with three pipeline RISC cores, a simple DMA controller and Content Addressable Memory. An evaluation of the desired RISC clock rate that is required to process TCP and UDP streams at 100 Gbps was conducted. It was determined that a RISC core running at 752 MHz with a DMA clock of 3753 MHz was able to process packets 512 bytes or larger fast enough to support 100 Gbps network speeds

    EbbRT: Elastic Building Block Runtime - case studies

    Full text link
    We present a new systems runtime, EbbRT, for cloud hosted applications. EbbRT takes a different approach to the role operating systems play in cloud computing. It supports stitching application functionality across nodes running commodity OSs and nodes running specialized application specific software that only execute what is necessary to accelerate core functions of the application. In doing so, it allows tradeoffs between efficiency, developer productivity, and exploitation of elasticity and scale. EbbRT, as a software model, is a framework for constructing applications as collections of standard application software and Elastic Building Blocks (Ebbs). Elastic Building Blocks are components that encapsulate runtime software objects and are implemented to exploit the raw access, scale and elasticity of IaaS resources to accelerate critical application functionality. This paper presents the EbbRT architecture, our prototype and experimental evaluation of the prototype under three different application scenarios

    An Innovative RAN Architecture for Emerging Heterogeneous Networks: The Road to the 5G Era

    Full text link
    The global demand for mobile-broadband data services has experienced phenomenal growth over the last few years, driven by the rapid proliferation of smart devices such as smartphones and tablets. This growth is expected to continue unabated as mobile data traffic is predicted to grow anywhere from 20 to 50 times over the next 5 years. Exacerbating the problem is that such unprecedented surge in smartphones usage, which is characterized by frequent short on/off connections and mobility, generates heavy signaling traffic load in the network signaling storms . This consumes a disproportion amount of network resources, compromising network throughput and efficiency, and in extreme cases can cause the Third-Generation (3G) or 4G (long-term evolution (LTE) and LTE-Advanced (LTE-A)) cellular networks to crash. As the conventional approaches of improving the spectral efficiency and/or allocation additional spectrum are fast approaching their theoretical limits, there is a growing consensus that current 3G and 4G (LTE/LTE-A) cellular radio access technologies (RATs) won\u27t be able to meet the anticipated growth in mobile traffic demand. To address these challenges, the wireless industry and standardization bodies have initiated a roadmap for transition from 4G to 5G cellular technology with a key objective to increase capacity by 1000Ã? by 2020 . Even though the technology hasn\u27t been invented yet, the hype around 5G networks has begun to bubble. The emerging consensus is that 5G is not a single technology, but rather a synergistic collection of interworking technical innovations and solutions that collectively address the challenge of traffic growth. The core emerging ingredients that are widely considered the key enabling technologies to realize the envisioned 5G era, listed in the order of importance, are: 1) Heterogeneous networks (HetNets); 2) flexible backhauling; 3) efficient traffic offload techniques; and 4) Self Organizing Networks (SONs). The anticipated solutions delivered by efficient interworking/ integration of these enabling technologies are not simply about throwing more resources and /or spectrum at the challenge. The envisioned solution, however, requires radically different cellular RAN and mobile core architectures that efficiently and cost-effectively deploy and manage radio resources as well as offload mobile traffic from the overloaded core network. The main objective of this thesis is to address the key techno-economics challenges facing the transition from current Fourth-Generation (4G) cellular technology to the 5G era in the context of proposing a novel high-risk revolutionary direction to the design and implementation of the envisioned 5G cellular networks. The ultimate goal is to explore the potential and viability of cost-effectively implementing the 1000x capacity challenge while continuing to provide adequate mobile broadband experience to users. Specifically, this work proposes and devises a novel PON-based HetNet mobile backhaul RAN architecture that: 1) holistically addresses the key techno-economics hurdles facing the implementation of the envisioned 5G cellular technology, specifically, the backhauling and signaling challenges; and 2) enables, for the first time to the best of our knowledge, the support of efficient ground-breaking mobile data and signaling offload techniques, which significantly enhance the performance of both the HetNet-based RAN and LTE-A\u27s core network (Evolved Packet Core (EPC) per 3GPP standard), ensure that core network equipment is used more productively, and moderate the evolving 5G\u27s signaling growth and optimize its impact. To address the backhauling challenge, we propose a cost-effective fiber-based small cell backhaul infrastructure, which leverages existing fibered and powered facilities associated with a PON-based fiber-to-the-Node/Home (FTTN/FTTH)) residential access network. Due to the sharing of existing valuable fiber assets, the proposed PON-based backhaul architecture, in which the small cells are collocated with existing FTTN remote terminals (optical network units (ONUs)), is much more economical than conventional point-to-point (PTP) fiber backhaul designs. A fully distributed ring-based EPON architecture is utilized here as the fiber-based HetNet backhaul. The techno-economics merits of utilizing the proposed PON-based FTTx access HetNet RAN architecture versus that of traditional 4G LTE-A\u27s RAN will be thoroughly examined and quantified. Specifically, we quantify the techno-economics merits of the proposed PON-based HetNet backhaul by comparing its performance versus that of a conventional fiber-based PTP backhaul architecture as a benchmark. It is shown that the purposely selected ring-based PON architecture along with the supporting distributed control plane enable the proposed PON-based FTTx RAN architecture to support several key salient networking features that collectively significantly enhance the overall performance of both the HetNet-based RAN and 4G LTE-A\u27s core (EPC) compared to that of the typical fiber-based PTP backhaul architecture in terms of handoff capability, signaling overhead, overall network throughput and latency, and QoS support. It will also been shown that the proposed HetNet-based RAN architecture is not only capable of providing the typical macro-cell offloading gain (RAN gain) but also can provide ground-breaking EPC offloading gain. The simulation results indicate that the overall capacity of the proposed HetNet scales with the number of deployed small cells, thanks to LTE-A\u27s advanced interference management techniques. For example, if there are 10 deployed outdoor small cells for every macrocell in the network, then the overall capacity will be approximately 10-11x capacity gain over a macro-only network. To reach the 1000x capacity goal, numerous small cells including 3G, 4G, and WiFi (femtos, picos, metros, relays, remote radio heads, distributed antenna systems) need to be deployed indoors and outdoors, at all possible venues (residences and enterprises)

    APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

    Full text link
    We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2

    Unified radio and network control across heterogeneous hardware platforms

    Get PDF
    Experimentation is an important step in the investigation of techniques for handling spectrum scarcity or the development of new waveforms in future wireless networks. However, it is impractical and not cost effective to construct custom platforms for each future network scenario to be investigated. This problem is addressed by defining Unified Programming Interfaces that allow common access to several platforms for experimentation-based prototyping, research, and development purposes. The design of these interfaces is driven by a diverse set of scenarios that capture the functionality relevant to future network implementations while trying to keep them as generic as possible. Herein, the definition of this set of scenarios is presented as well as the architecture for supporting experimentation-based wireless research over multiple hardware platforms. The proposed architecture for experimentation incorporates both local and global unified interfaces to control any aspect of a wireless system while being completely agnostic to the actual technology incorporated. Control is feasible from the low-level features of individual radios to the entire network stack, including hierarchical control combinations. A testbed to enable the use of the above architecture is utilized that uses a backbone network in order to be able to extract measurements and observe the overall behaviour of the system under test without imposing further communication overhead to the actual experiment. Based on the aforementioned architecture, a system is proposed that is able to support the advancement of intelligent techniques for future networks through experimentation while decoupling promising algorithms and techniques from the capabilities of a specific hardware platform
    corecore