1,350 research outputs found
Effects of Communication Protocol Stack Offload on Parallel Performance in Clusters
The primary research objective of this dissertation is to demonstrate that the effects of communication protocol stack offload (CPSO) on application execution time can be attributed to the following two complementary sources. First, the application-specific computation may be executed concurrently with the asynchronous communication performed by the communication protocol stack offload engine. Second, the protocol stack processing can be accelerated or decelerated by the offload engine. These two types of performance effects can be quantified with the use of the degree of overlapping Do and degree of acceleration Daccs. The composite communication speedup metrics S_comm(Do, Daccs) can be used in order to quantify the combined effects of the protocol stack offload. This dissertation thesis is validated empirically. The degree of overlapping Do, the degree of acceleration Daccs, and the communication speedup Scomm characteristic of the system configurations under test are derived in the course of experiments performed for the system configurations of interest. It is shown that the proposed metrics adequately describe the effects of the protocol stack offload on the application execution time. Additionally, a set of analytical models of the networking subsystem of a PC-based cluster node is developed. As a result of the modeling, the metrics Do, Daccs, and Scomm are obtained. The models are evaluated as to their complexity and precision by comparing the modeling results with the measured values of Do, Daccs, and Scomm. The primary contributions of this dissertation research are as follows. First, the metric Daccs and Scomm are introduced in order to complement the Do metric in its use for evaluation of the effects of optimizations in the networking subsystem on parallel performance in clusters. The metrics are shown to adequately describe CPSO performance effects. Second, a method for assessing performance effects of CPSO scenarios on application performance is developed and presented. Third, a set of analytical models of cluster node networking subsystems with CPSO capability is developed and characterised as to their complexity and precision of the prediction of the Do and Daccs metrics
EbbRT: a framework for building per-application library operating systems
Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux
eBPF-based Content and Computation-aware Communication for Real-time Edge Computing
By placing computation resources within a one-hop wireless topology, the
recent edge computing paradigm is a key enabler of real-time Internet of Things
(IoT) applications. In the context of IoT scenarios where the same information
from a sensor is used by multiple applications at different locations, the data
stream needs to be replicated. However, the transportation of parallel streams
might not be feasible due to limitations in the capacity of the network
transporting the data. To address this issue, a content and computation-aware
communication control framework is proposed based on the Software Defined
Network (SDN) paradigm. The framework supports multi-streaming using the
extended Berkeley Packet Filter (eBPF), where the traffic flow and packet
replication for each specific computation process is controlled by a program
running inside an in-kernel Virtual Ma- chine (VM). The proposed framework is
instantiated to address a case-study scenario where video streams from multiple
cameras are transmitted to the edge processor for real-time analysis. Numerical
results demonstrate the advantage of the proposed framework in terms of
programmability, network bandwidth and system resource savings.Comment: This article has been accepted for publication in the IEEE
International Conference on Computer Communications (INFOCOM Workshops), 201
Network stack specialization for performance
Contemporary network stacks are masterpieces of generality, supporting a range of edge-node and middle-node functions. This generality comes at significant performance cost: current APIs, memory models, and implementations drastically limit the effectiveness of increasingly powerful hardware. Generality has historically been required to allow individual systems to perform many functions. However, as providers have scaled up services to support hundreds of millions of users, they have transitioned toward many thousands (or even millions) of dedicated servers performing narrow ranges of functions. We argue that the overhead of generality is now a key obstacle to effective scaling, making specialization not only viable, but necessary. This paper presents Sandstorm, a clean-slate userspace network stack that exploits knowledge of web server semantics, improving throughput over current off-the-shelf designs while retaining use of conventional operating-system and programming frameworks. Based on Netmap, our novel approach merges application and network-stack memory models, aggressively amortizes stack-internal TCP costs based on application-layer knowledge, tightly couples with the NIC event model, and exploits low-latency hardware access. We compare our approach to the FreeBSD and Linux network stacks with nginx as the web server, demonstrating ∼3.5x throughput improvement, while experiencing low CPU utilization, linear scaling on multicore systems, and saturating current NIC hardware
BitTorrent Experiments on Testbeds: A Study of the Impact of Network Latencies
In this paper, we study the impact of network latency on the time required to
download a file distributed using BitTorrent. This study is essential to
understand if testbeds can be used for experimental evaluation of BitTorrent.
We observe that the network latency has a marginal impact on the time required
to download a file; hence, BitTorrent experiments can performed on testbeds
- …