1 research outputs found
Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics
We consider methods for learning vector representations of SQL queries to
support generalized workload analytics tasks, including workload summarization
for index selection and predicting queries that will trigger memory errors. We
consider vector representations of both raw SQL text and optimized query plans,
and evaluate these methods on synthetic and real SQL workloads. We find that
general algorithms based on vector representations can outperform existing
approaches that rely on specialized features. For index recommendation, we
cluster the vector representations to compress large workloads with no loss in
performance from the recommended index. For error prediction, we train a
classifier over learned vectors that can automatically relate subtle syntactic
patterns with specific errors raised during query execution. Surprisingly, we
also find that these methods enable transfer learning, where a model trained on
one SQL corpus can be applied to an unrelated corpus and still enable good
performance. We find that these general approaches, when trained on a large
corpus of SQL queries, provides a robust foundation for a variety of workload
analysis tasks and database features, without requiring application-specific
feature engineering