176 research outputs found
Recommended from our members
System Design for Software Packet Processing
The role of software in computer networks has never been more crucial than today, with the advent of Internet-scale services and cloud computing. The trend toward software-based network dataplane—as in network function virtualization—requires software packet processing to meet challenging perfomance requirements, such as supporting exponentially increasing link bandwidth and microsecond-order latency. Many architectural aspects of existing software systems for packet processing, however, are decades old and ill-suited totoday’s network I/O workloads.In this dissertation, we explore the design space of high-performance software packet processing systems in the context of two application domains, . First, we start by discussingthe limitations of BSD Socket, which is a de-facto standard in network I/O for server applications. We quantify its performance limitations and propose a clean-slate API, called MegaPipe, as an alternative to BSD Socket. In the second part of this dissertation, we switch our focus to in-network software systems for network functions, such as network switches and middleboxes. We present Berkeley Extensible Software Switch (BESS), a modular framework for building extensible network functions. BESS introduces various novel techniques to achieve high-performance software packet processing, without compromising on either programmability or flexibility
A Survey on Transactional Stream Processing
Transactional stream processing (TSP) strives to create a cohesive model that
merges the advantages of both transactional and stream-oriented guarantees.
Over the past decade, numerous endeavors have contributed to the evolution of
TSP solutions, uncovering similarities and distinctions among them. Despite
these advances, a universally accepted standard approach for integrating
transactional functionality with stream processing remains to be established.
Existing TSP solutions predominantly concentrate on specific application
characteristics and involve complex design trade-offs. This survey intends to
introduce TSP and present our perspective on its future progression. Our
primary goals are twofold: to provide insights into the diverse TSP
requirements and methodologies, and to inspire the design and development of
groundbreaking TSP systems
Adaptive Controller to Identify Misconfigurations and Optimize the Performance of Kubernetes Clusters and IoT Edge Devices
Kubernetes default configurations do not always provide optimal security and performance for all clusters and IoT edge devices deployed, affecting the scalability of a given workload and making them vulnerable to security breaches and information leakage if misconfigured. We present an adaptive controller to identify the type of misconfiguration and its consequence threat to optimize the system behavior. Our work differs from existing approaches as it is fully automated and can diagnose various errors on the fly. The controller is evaluated in terms of quality and accuracy of identification. The results show that the controller can identify around 90% of the total number of configuration values with a reasonable average identification overhead
Building interactive distributed processing applications at a global scale
Along with the continuous engagement with technology, many latency-sensitive interactive applications have emerged, e.g., global content sharing in social networks, adaptive lights/temperatures in smart buildings, and online multi-user games. These applications typically process a massive amount of data at a global scale. In this cases, distributing storage and processing is key to handling the large scale. Distribution necessitates handling two main aspects: a) the placement of data/processing and b) the data motion across the distributed locations. However, handling the distribution while meeting latency guarantees at large scale comes with many challenges around hiding heterogeneity and diversity of devices and workload, handling dynamism in the environment, providing continuous availability despite failures, and supporting persistent large state.
In this thesis, we show how latency-driven designs for placement and data-motion can be used to build production infrastructures for interactive applications at a global scale, while also being able to address myriad challenges on heterogeneity, dynamism, state, and availability. We demonstrate a latency-driven approach is general and applicable at all layers of the stack: from storage, to processing, down to networking.
We designed and built four distinct systems across the spectrum. We have developed Ambry (collaboration with LinkedIn), a geo-distributed storage system for interactive data sharing across the globe. Ambry is LinkedIn's mainstream production system for all its media content running across 4 datacenters and over 500 million users. Ambry minimizes user perceived latency via smart data placement and propagation. Second, we have built two processing systems, a traditional model, Samza, and the avant-garde model, Steel. Samza (collaboration with LinkedIn) is a production stream processing framework used at 15 companies (including LinkedIn, Uber, Netflix, and TripAdvisor), powering >200 pipelines at LinkedIn alone. Samza minimizes the impact of data motion on the end-to-end latency, thus, enabling large persistent state (100s of TB) along with processing. Steel (collaboration with Microsoft) extends processing to the emerging edge. Integrated with Azure, Steel dynamically optimizes placement and data-motion across the entire edge-cloud environment. Finally, we have designed FreeFlow, a high performance networking mechanisms for containers. Using the container placement, FreeFlow opportunistically bypasses networking layers, minimizing data motion and reducing latency (up to 3 orders of magnitude)
Workload Management for Data-Intensive Services
<p>Data-intensive web services are typically composed of three tiers: i) a display tier that interacts with users and serves rich content to them, ii) a storage tier that stores the user-generated or machine-generated data used to create this content, and iii) an analytics tier that runs data analysis tasks in order to create and optimize new content. Each tier has different workloads and requirements that result in a diverse set of systems being used in modern data-intensive web services.</p><p>Servers are provisioned dynamically in the display tier to ensure that interactive client requests are served as per the latency and throughput requirements. The challenge is not only deciding automatically how many servers to provision but also when to provision them, while ensuring stable system performance and high resource utilization. To address these challenges, we have developed a new control policy for provisioning resources dynamically in coarse-grained units (e.g., adding or removing servers or virtual machines in cloud platforms). Our new policy, called proportional thresholding, converts a user-specified performance target value into a target range in order to account for the relative effect of provisioning a server on the overall workload performance.</p><p>The storage tier is similar to the display tier in some respects, but poses the additional challenge of needing redistribution of stored data when new storage nodes are added or removed. Thus, there will be some delay before the effects of changing a resource allocation will appear. Moreover, redistributing data can cause some interference to the current workload because it uses resources that can otherwise be used for processing requests. We have developed a system, called Elastore, that addresses the new challenges found in the storage tier. Elastore not only coordinates resource allocation and data redistribution to preserve stability during dynamic resource provisioning, but it also finds the best tradeoff between workload interference and data redistribution time.</p><p>The workload in the analytics tier consists of data-parallel workflows that can either be run in a batch fashion or continuously as new data becomes available. Each workflow is composed of smaller units that have producer-consumer relationships based on data. These workflows are often generated from declarative specifications in languages like SQL, so there is a need for a cost-based optimizer that can generate an efficient execution plan for a given workflow. There are a number of challenges when building a cost-based optimizer for data-parallel workflows, which includes characterizing the large execution plan space, developing cost models to estimate the execution costs, and efficiently searching for the best execution plan. We have built two cost-based optimizers: Stubby for batch data-parallel workflows running on MapReduce systems, and Cyclops for continuous data-parallel workflows where the choice of execution system is made a part of the execution plan space.</p><p>We have conducted a comprehensive evaluation that shows the effectiveness of each tier's automated workload management solution.</p>Dissertatio
Optimizing Virtual Machine I/O Performance in Virtualized Cloud by Differenciated-frequency Scheduling and Functionality Offloading
Many enterprises are increasingly moving their applications to private cloud environments or public cloud platforms. A key technology driving cloud computing is virtualization which can serve multiple VMs in one physical machine hence providing better management flexibility and significant savings in operational costs. However, one important consequence of virtualized hosts in the cloud is the negative impact it has on the I/O performance of the applications running in the VMs
- …