With the continuous data explosion in the big data era, traditional software and hardware stack
are facing unprecedented challenges on how to operate on such data scale. Thus, designing new
architectures and efficient systems for data oriented applications has become increasingly critical.
This motivates us to re-think of the conventional storage system design and re-architect both
software and hardware to meet the challenges of scale.
Besides the fast growth of data volume, the increasing demand on storage applications such
as video streaming, data analytics are pushing high performance flash based storage devices to
replace the traditional spinning disks. Such all-flash era increase the data reliability concerns
due to the endurance problem of flash devices. Key-value stores (KVS) are important storage
infrastructure to handle the fast growing unstructured data and have been widely deployed in a
variety of scale-out enterprise applications such as online retail, big data analytic, social networks,
etc. How to efficiently manage data redundancy for key-value stores to provide data reliability, how
to efficiently support range query for key-value stores to accelerate analytic oriented applications
under emerging key-value store system architecture become an important research problem.
In this research, we focus on how to design new software hardware architectures for the keyvalue
store applications to provide reliability and improve query performance. In order to address
the different issues identified in this dissertation, we propose to employ a logical key management
layer, a thin layer above the KV devices that maps logical keys into phsyical keys on the devices.
We show how such a layer can enable multiple solutions to improve the performance and reliability
of KVSSD based storage systems. First, we present KVRAID, a high performance, write
efficient erasure coding management scheme on emerging key-value SSDs. The core innovation
of KVRAID is to propose a logical key management layer that maps logical keys to physical keys
to efficiently pack similar size KV objects and dynamically manage the membership of erasure
coding groups. Unlike existing schemes which manage erasure codes on the block level, KVRAID
manages the erasure codes on the KV object level. In order to achieve better storage efficiency for variable sized objects, KVRAID predefines multiple fixed sizes (slabs) according to the object size
distribution for the erasure code. KVRAID uses a logical to physical key conversion to pack the
KV objects of similar size into a parity group. KVRAID uses a lazy deletion mechanism with a
garbage collector for object updates. Our experiments show that in 100% put case, KVRAID outperforms
software block RAID by 18x in case of throughput and reduces 15x write amplification
(WAF) with only ~5% CPU utilization. In a mixed update/get workloads, KVRAID achieves ~4x
better throughput with ~23% CPU utilization and reduces the storage overhead and WAF by 3.6x
and 11.3x in average respectively.
Second, we present KVRangeDB, an ordered log structure tree based key index that supports
range queries on a hash-based KVSSD. In addition, we propose to pack smaller application records
into a larger physical record on the device through the logical key management layer. We compared
the performance of KVRangeDB against RocksDB implementation on KVSSD and stateof-
art software KV-store Wisckey on block device, on three types of real world applications of
cloud-serving workloads, TABLEFS filesystem and time-series databases. For cloud serving applications,
KVRangeDB achieves 8.3x and 1.7x better 99.9% write tail latency respectively compared
to RocksDB implementation on KV-SSD and Wisckey on block SSD. On the query side,
KVrangeDB only performs worse for those very long scans, but provides fast point queries and
closed range queries. The experiments on TABLEFS demonstrate that using KVRangeDB for
metadata indexing can boost the performance by a factor of ~6.3x in average and reduce ~3.9x
CPU cost for four metadata-intensive workloads compared to RocksDB implementation on KVSSD.
Compared toWisckey, KVRangeDB improves performance by ~2.6x in average and reduces
~1.7x CPU usage.
Third, we propose a generic FPGA accelerator for emerging Minimum Storage Regenerating
(MSR) codes encoding/decoding which maximizes the computation parallelism and minimizes
the data movement between off-chip DRAM and the on-chip SRAM buffers. To demonstrate the
efficiency of our proposed accelerator, we implemented the encoding/decoding algorithms for a
specific MSR code called Zigzag code on Xilinx VCU1525 acceleration card. Our evaluation shows our proposed accelerator can achieve ~2.4-3.1x better throughput and ~4.2-5.7x better
power efficiency compared to the state-of-art multi-core CPU implementation and ~2.8-3.3x better
throughput and ~4.2-5.3x better power efficiency compared to a modern GPU accelerato