slides

Benefit of enabling fine-grained routing and load balancing

Abstract

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 61-63).Data volumes are exploding. It is essential to use multiple machines to store such large amounts of data. To address this explosion, storage systems like databases need to be distributed across many machines. Transactions that access a few tuples, often seen in web workloads such as Twitter, do not run optimally using traditional partitioning schemes [25]. Hence, increasing the number of machines often presents a bottleneck for workloads where each transaction accesses just a few tuples. Fine-grained partitioning can fix the scale out problem introduced by simplistic partitioning schemes. In this thesis, I introduce a design of a distributed query execution system that handles fine-grained partitioning using look-up tables. I introduce look-up tables, which is a mapping from a tuple attribute to a tuple back-end location such that fine grained partitioning can be supported. I show through both synthetic and real data that fine-grained partitioning enabled by look-up tables can increase throughput of a distributed database system. My goal is scale-out with the number of machines used in the distributed database. I show in my experiments that scale-out can be reached if an ideal partitioning can be created. I test my implementation on a Wikipedia data set. I show in this example a factor of three times better performance compared to the optimal hash partitioning scheme with eight back-ends and signs of continual scale-out with more machines. Through the use of large data sets and projecting my results onto even larger data sets, I show that look-up tables can be used to represent complex partitioning schemes for databases containing billions of tuples.by Aubrey Lynn Tatarowicz.M.Eng

    Similar works