HYDRA: A Dynamic Big Data Regenerator

Abstract

A core requirement of database engine testing is the ability to create synthetic versions of the customer’s data warehouse at the vendor site. Prior work on synthetic data regeneration suffers from critical limitations with regard to (a) scaling to large data volumes, (b) handling complex query workloads, and (c) producing data on demand. In this demo, we present HYDRA, a workload-dependent dynamic data regenerator, that materially addresses these limitations. It introduces the concept of dynamic regeneration by constructing a minuscule memory-resident database summary that can on-the-fly regenerate databases of arbitrary size during query execution. Further, since the data is generated in memory, the velocity of generation can be closely regulated. Finally, to complement dynamic regeneration, Hydra also ensures that the process of summary construction is data-scale-free

    Similar works