Physically Dense Server Architectures.
- Publication date
- Publisher
Abstract
Distributed, in-memory key-value stores have emerged as one of today's most
important data center workloads. Being critical for the scalability of modern
web services, vast resources are dedicated to key-value stores in order
to ensure that quality of service guarantees are met. These resources include:
many server racks to store terabytes of key-value data, the power necessary to
run all of the machines, networking equipment and bandwidth, and the data center
warehouses used to house the racks.
There is, however, a mismatch between the key-value store software and the
commodity servers on which it is run, leading to inefficient use of resources.
The primary cause of inefficiency is the overhead incurred from processing
individual network packets, which typically carry small payloads, and require
minimal compute resources. Thus, one of the key challenges as we enter the
exascale era is how to best adjust to the paradigm shift from compute-centric
to storage-centric data centers.
This dissertation presents a hardware/software solution that addresses the
inefficiency issues present in the modern data centers on which key-value
stores are currently deployed. First, it proposes two physical server
designs, both of which use 3D-stacking technology and low-power CPUs to improve
density and efficiency. The first 3D architecture---Mercury---consists of stacks
of low-power CPUs with 3D-stacked DRAM. The second
architecture---Iridium---replaces DRAM with 3D NAND Flash to improve density.
The second portion of this dissertation proposes and enhanced version of the
Mercury server design---called KeyVault---that incorporates integrated,
zero-copy network interfaces along with an integrated switching fabric. In order
to utilize the integrated networking hardware, as well as reduce the
response time of requests, a custom networking protocol is proposed. Unlike
prior works on accelerating key-value stores---e.g., by completely bypassing the
CPU and OS when processing requests---this work only bypasses the CPU and OS
when placing network payloads into a process' memory. The insight behind this is
that because most of the overhead comes from processing packets in the OS
kernel---and not the request processing itself---direct placement of packet's
payload is sufficient to provide higher throughput and lower latency than prior
approaches.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111414/1/atgutier_1.pd