Text-to-SQL simplifies database interactions by enabling non-experts to
convert their natural language (NL) questions into Structured Query Language
(SQL) queries. While recent advances in large language models (LLMs) have
improved the zero-shot text-to-SQL paradigm, existing methods face scalability
challenges when dealing with massive, dynamically changing databases. This
paper introduces DBCopilot, a framework that addresses these challenges by
employing a compact and flexible copilot model for routing across massive
databases. Specifically, DBCopilot decouples the text-to-SQL process into
schema routing and SQL generation, leveraging a lightweight
sequence-to-sequence neural network-based router to formulate database
connections and navigate natural language questions through databases and
tables. The routed schemas and questions are then fed into LLMs for efficient
SQL generation. Furthermore, DBCopilot also introduced a reverse
schema-to-question generation paradigm, which can learn and adapt the router
over massive databases automatically without requiring manual intervention.
Experimental results demonstrate that DBCopilot is a scalable and effective
solution for real-world text-to-SQL tasks, providing a significant advancement
in handling large-scale schemas.Comment: Code and data are available at https://github.com/tshu-w/DBCopilo