1,614 research outputs found

    Cost effective RISC core supporting the large sending offload

    Get PDF
    The Ethernet speed has increased sending and receiving frames from 40 to 100 Gbps after the IEEE P802.3ba released. The industry and academia have focused scaling up the TCP/IP protocol processing for 40-100 Gbps. LSO is a de facto standard, which is offloaded to network interface for sending packets up to 10 Gbps. It not clears whether a network interface can support such function for new 40-100 Gbps. The widely use of the hardware-based NIC such as the use of a fully customized logic based network interface can be due to the following reasons; Still it is not clear whether the General Purpose Processor (GPP) can provide the processing required for high-speed line beyond the 10 Gbps. Also, the limit of the GPP's clock in supporting the processing of network interfaces. However, using a RISC core engine for offloading the LSO function can deliver some important features to network interfaces design, such as simplicity, scalability, shorter developing cycle time. In this paper, we have investigated using a specialized RISC core to process the LSO functions for TCP/IP and UDP/IP for high-speed communications rate up to 100 Gbps. To achieve this, we have enhanced the LSO algorithm to scale it to 100 Gbps. A fast DMA is used to support transferring data in the network interface. The LSO processing methodology on the network has presented. In addition, the RISC's performance and data movements for high communication rate up to 100 Gbps have been measured. A 148 MHz RISC core can support the sending-side processing for up to 100 Gbps transmission speed for the TCP/IP and UDP/IP protocol when the MTU is applied (1500 bytes). A DMA with 3759 MHz is required to eliminate the idle cycles while transferring data over the 64-bit local bus

    Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis

    Get PDF
    Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data. Heterogeneous data centers are pushing towards more cost-efficient architectures with better resource provisioning. In this paper we study the feasibility of using disaggregated architectures for intensive data applications, in contrast to the monolithic approach of server-oriented architectures. Particularly, we have tested a proactive network analysis system in which the workload demands are highly variable. In the context of the dReDBox disaggregated architecture, the results show that the overhead caused by using remote memory resources is significant, between 66\% and 80\%, but we have also observed that the memory usage is one order of magnitude higher for the stress case with respect to average workloads. Therefore, dimensioning memory for the worst case in conventional systems will result in a notable waste of resources. Finally, we found that, for the selected use case, parallelism is limited by memory. Therefore, using a disaggregated architecture will allow for increased parallelism, which, at the same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper will be presented during the IEEE International Conference on High Performance Computing and Communications in Bangkok, Thailand. 18 - 20 December, 2017. To be published in the conference proceeding

    Performance Optimization and Dynamics Control for Large-scale Data Transfer in Wide-area Networks

    Get PDF
    Transport control plays an important role in the performance of large-scale scientific and media streaming applications involving transfer of large data sets, media streaming, online computational steering, interactive visualization, and remote instrument control. In general, these applications have two distinctive classes of transport requirements: large-scale scientific applications require high bandwidths to move bulk data across wide-area networks, while media streaming applications require stable bandwidths to ensure smooth media playback. Unfortunately, the widely deployed Transmission Control Protocol is inadequate for such tasks due to its performance limitations. The purpose of this dissertation is to conduct rigorous analytical study of the design and performance of transport solutions, and develop an integrated transport solution in a systematical way to overcome the limitations of current transport methods. One of the primary challenges is to explore and compose a set of feasible route options with multiple constraints. Another challenge essentially arises from the randomness inherent in wide-area networks, particularly the Internet. This randomness must be explicitly accounted for to achieve both goodput maximization and stabilization over the constructed routes by suitably adjusting the source rate in response to both network and host dynamics.The superior and robust performance of the proposed transport solution is extensively evaluated in a simulated environment and further verified through real-life implementations and deployments over both Internet and dedicated connections under disparate network conditions in comparison with existing transport methods

    Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks

    Get PDF
    Virtual Ethernet overlay provides a powerful model for realizing virtual distributed and parallel computing systems with strong isolation, portability, and recoverability properties. However, in extremely high throughput and low latency networks, such overlays can suffer from bandwidth and latency limitations, which is of particular concern in HPC environments. Through a careful and quantitative analysis, I iden- tify three core issues limiting performance: delayed and excessive virtual interrupt delivery into guests, copies between host and guest data buffers during encapsulation, and the semantic gap between virtual Ethernet features and underlying physical network features. I propose three novel optimizations in response: optimistic timer- free virtual interrupt injection, zero-copy cut-through data forwarding, and virtual TCP offload. These optimizations improve the latency and bandwidth of the overlay network on 10 Gbps Ethernet and InfiniBand interconnects, resulting in near-native performance for a wide range of microbenchmarks and MPI application benchmarks
    • …
    corecore