1,350 research outputs found

    Effects of Communication Protocol Stack Offload on Parallel Performance in Clusters

    Get PDF
    The primary research objective of this dissertation is to demonstrate that the effects of communication protocol stack offload (CPSO) on application execution time can be attributed to the following two complementary sources. First, the application-specific computation may be executed concurrently with the asynchronous communication performed by the communication protocol stack offload engine. Second, the protocol stack processing can be accelerated or decelerated by the offload engine. These two types of performance effects can be quantified with the use of the degree of overlapping Do and degree of acceleration Daccs. The composite communication speedup metrics S_comm(Do, Daccs) can be used in order to quantify the combined effects of the protocol stack offload. This dissertation thesis is validated empirically. The degree of overlapping Do, the degree of acceleration Daccs, and the communication speedup Scomm characteristic of the system configurations under test are derived in the course of experiments performed for the system configurations of interest. It is shown that the proposed metrics adequately describe the effects of the protocol stack offload on the application execution time. Additionally, a set of analytical models of the networking subsystem of a PC-based cluster node is developed. As a result of the modeling, the metrics Do, Daccs, and Scomm are obtained. The models are evaluated as to their complexity and precision by comparing the modeling results with the measured values of Do, Daccs, and Scomm. The primary contributions of this dissertation research are as follows. First, the metric Daccs and Scomm are introduced in order to complement the Do metric in its use for evaluation of the effects of optimizations in the networking subsystem on parallel performance in clusters. The metrics are shown to adequately describe CPSO performance effects. Second, a method for assessing performance effects of CPSO scenarios on application performance is developed and presented. Third, a set of analytical models of cluster node networking subsystems with CPSO capability is developed and characterised as to their complexity and precision of the prediction of the Do and Daccs metrics

    EbbRT: a framework for building per-application library operating systems

    Full text link
    Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux

    eBPF-based Content and Computation-aware Communication for Real-time Edge Computing

    Full text link
    By placing computation resources within a one-hop wireless topology, the recent edge computing paradigm is a key enabler of real-time Internet of Things (IoT) applications. In the context of IoT scenarios where the same information from a sensor is used by multiple applications at different locations, the data stream needs to be replicated. However, the transportation of parallel streams might not be feasible due to limitations in the capacity of the network transporting the data. To address this issue, a content and computation-aware communication control framework is proposed based on the Software Defined Network (SDN) paradigm. The framework supports multi-streaming using the extended Berkeley Packet Filter (eBPF), where the traffic flow and packet replication for each specific computation process is controlled by a program running inside an in-kernel Virtual Ma- chine (VM). The proposed framework is instantiated to address a case-study scenario where video streams from multiple cameras are transmitted to the edge processor for real-time analysis. Numerical results demonstrate the advantage of the proposed framework in terms of programmability, network bandwidth and system resource savings.Comment: This article has been accepted for publication in the IEEE International Conference on Computer Communications (INFOCOM Workshops), 201

    Network stack specialization for performance

    Get PDF
    Contemporary network stacks are masterpieces of generality, supporting a range of edge-node and middle-node functions. This generality comes at significant performance cost: current APIs, memory models, and implementations drastically limit the effectiveness of increasingly powerful hardware. Generality has historically been required to allow individual systems to perform many functions. However, as providers have scaled up services to support hundreds of millions of users, they have transitioned toward many thousands (or even millions) of dedicated servers performing narrow ranges of functions. We argue that the overhead of generality is now a key obstacle to effective scaling, making specialization not only viable, but necessary. This paper presents Sandstorm, a clean-slate userspace network stack that exploits knowledge of web server semantics, improving throughput over current off-the-shelf designs while retaining use of conventional operating-system and programming frameworks. Based on Netmap, our novel approach merges application and network-stack memory models, aggressively amortizes stack-internal TCP costs based on application-layer knowledge, tightly couples with the NIC event model, and exploits low-latency hardware access. We compare our approach to the FreeBSD and Linux network stacks with nginx as the web server, demonstrating ∼3.5x throughput improvement, while experiencing low CPU utilization, linear scaling on multicore systems, and saturating current NIC hardware

    BitTorrent Experiments on Testbeds: A Study of the Impact of Network Latencies

    Get PDF
    In this paper, we study the impact of network latency on the time required to download a file distributed using BitTorrent. This study is essential to understand if testbeds can be used for experimental evaluation of BitTorrent. We observe that the network latency has a marginal impact on the time required to download a file; hence, BitTorrent experiments can performed on testbeds
    corecore