3 research outputs found

    Automated Control of Aggressive Prefetching for HTTP Streaming Video Servers

    No full text

    Understanding and Efficiently Servicing HTTP Streaming Video Workloads

    Get PDF
    Live and on-demand video streaming has emerged as the most popular application for the Internet. One reason for this success is the pragmatic decision to use HTTP to deliver video content. However, while all web servers are capable of servicing HTTP streaming video workloads, web servers were not originally designed or optimized for video workloads. Web server research has concentrated on requests for small items that exhibit high locality, while video files are much larger and have a popularity distribution with a long tail of less popular content. Given the large number of servers needed to service millions of streaming video clients, there are large potential benefits from even small improvements in servicing HTTP streaming video workloads. To investigate how web server implementations can be improved, we require a benchmark to analyze existing web servers and test alternate implementations, but no such HTTP streaming video benchmark exists. One reason for the lack of a benchmark is that video delivery is undergoing rapid evolution, so we devise a flexible methodology and tools for creating benchmarks that can be readily adapted to changes in HTTP video streaming methods. Using our methodology, we characterize YouTube traffic from early 2011 using several published studies and implement a benchmark to replicate this workload. We then demonstrate that three different widely-used web servers (Apache, nginx and the userver) are all poorly suited to servicing streaming video workloads. We modify the userver to use asynchronous serialized aggressive prefetching (ASAP). Aggressive prefetching uses a single large disk access to service multiple small sequential requests, and serialization prevents the kernel from interleaving disk accesses, which together greatly increase throughput. Using the modified userver, we show that characteristics of the workload and server affect the best prefetch size to use and we provide an algorithm that automatically finds a good prefetch size for a variety of workloads and server configurations. We conduct our own characterization of an HTTP streaming video workload, using server logs obtained from Netflix. We study this workload because, in 2015, Netflix alone accounted for 37% of peak period North American Internet traffic. Netflix clients employ DASH (Dynamic Adaptive Streaming over HTTP) to switch between different bit rates based on changes in network and server conditions. We introduce the notion of chains of sequential requests to represent the spatial locality of workloads and find that even with DASH clients, the majority of bytes are requested sequentially. We characterize rate adaptation by separating sessions into transient, stable and inactive phases, each with distinct patterns of requests. We find that playback sessions are surprisingly stable; in aggregate, 5% of total session duration is spent in transient phases, 79% in stable and 16% in inactive phases. Finally we evaluate prefetch algorithms that exploit knowledge about workload characteristics by simulating the servicing of the Netflix workload. We show that the workload can be serviced with either 13% lower hard drive utilization or 48% less system memory than a prefetch algorithm that makes no use of workload characteristics

    Building Efficient Software to Support Content Delivery Services

    Get PDF
    Many content delivery services use key components such as web servers, databases, and key-value stores to serve content over the Internet. These services, and their component systems, face unique modern challenges. Services now operate at massive scale, serving large files to wide user-bases. Additionally, resource contention is more prevalent than ever due to large file sizes, cloud-hosted and collocated services, and the use of resource-intensive features like content encryption. Existing systems have difficulty adapting to these challenges while still performing efficiently. For instance, streaming video web servers work well with small data, but struggle to service large, concurrent requests from disk. Our goal is to demonstrate how software can be augmented or replaced to help improve the performance and efficiency of select components of content delivery services. We first introduce Libception, a system designed to help improve disk throughput for web servers that process numerous concurrent disk requests for large content. By using serialization and aggressive prefetching, Libception improves the throughput of the Apache and nginx web servers by a factor of 2 on FreeBSD and 2.5 on Linux when serving HTTP streaming video content. Notably, this improvement is achieved without changing the source code of either web server. We additionally show that Libception's benefits translate into performance gains for other workloads, reducing the runtime of a microbenchmark using the diff utility by 50% (again without modifying the application's source code). We next implement Nessie, a distributed, RDMA-based, in-memory key-value store. Nessie decouples data from indexing metadata, and its protocol only consumes CPU on servers that initiate operations. This design makes Nessie resilient against CPU interference, allows it to perform well with large data values, and conserves energy during periods of non-peak load. We find that Nessie doubles throughput versus other approaches when CPU contention is introduced, and has 70% higher throughput when managing large data in write-oriented workloads. It also provides 41% power savings (over idle power consumption) versus other approaches when system load is at 20% of peak throughput. Finally, we develop RocketStreams, a framework which facilitates the dissemination of live streaming video. RocketStreams exposes an easy-to-use API to applications, obviating the need for services to manually implement complicated data management and networking code. RocketStreams' TCP-based dissemination compares favourably to an alternative solution, reducing CPU utilization on delivery nodes by 54% and increasing viewer throughput by 27% versus the Redis data store. Additionally, when RDMA-enabled hardware is available, RocketStreams provides RDMA-based dissemination which further increases overall performance, decreasing CPU utilization by 95% and increasing concurrent viewer throughput by 55% versus Redis
    corecore