Database Learning: Toward a Database that Becomes Smarter Every Time

Acharya S.; Agrawal S.; Bishop C. M.; Carbonell J. G.; Carlson A.; Condie T.; Ganti V.; Idreos S.; Lawrence N.; Meliou A.; Micchelli C. A.; Mozafari B.; Mozafari B.; Mozafari B.; Olston C.; Park Y.; Rusu F.; Sarawagi S.; Sidirourgos L.; Skilling J.; Wasserman L.; Williams C. K.

research

Database Learning: Toward a Database that Becomes Smarter Every Time

Authors: Acharya S.
Agrawal S.
Bishop C. M.
Carbonell J. G.
Carlson A.
Condie T.
Ganti V.
Idreos S.
Lawrence N.
Meliou A.
Micchelli C. A.
Mozafari B.
Mozafari B.
Mozafari B.
Olston C.
Park Y.
Rusu F.
Sarawagi S.
Sidirourgos L.
Skilling J.
Wasserman L.
Williams C. K.
Publication date: 28 March 2017
Publisher: 'Association for Computing Machinery (ACM)'
Doi

Abstract

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 201

Similar works

Full text

Available Versions

Crossref

info:doi/10.1145%2F3035918.306...

Last time updated on 11/12/2019