The Natural Language Interface to Databases (NLIDB) empowers non-technical
users with database access through intuitive natural language (NL)
interactions. Advanced approaches, utilizing neural sequence-to-sequence models
or large-scale language models, typically employ auto-regressive decoding to
generate unique SQL queries sequentially. While these translation models have
greatly improved the overall translation accuracy, surpassing 70% on NLIDB
benchmarks, the use of auto-regressive decoding to generate single SQL queries
may result in sub-optimal outputs, potentially leading to erroneous
translations. In this paper, we propose Metasql, a unified generate-then-rank
framework that can be flexibly incorporated with existing NLIDBs to
consistently improve their translation accuracy. Metasql introduces query
metadata to control the generation of better SQL query candidates and uses
learning-to-rank algorithms to retrieve globally optimized queries.
Specifically, Metasql first breaks down the meaning of the given NL query into
a set of possible query metadata, representing the basic concepts of the
semantics. These metadata are then used as language constraints to steer the
underlying translation model toward generating a set of candidate SQL queries.
Finally, Metasql ranks the candidates to identify the best matching one for the
given NL query. Extensive experiments are performed to study Metasql on two
public NLIDB benchmarks. The results show that the performance of the
translation models can be effectively improved using Metasql