3 research outputs found
Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface
A natural language database interface (NLDB) can democratize data-driven
insights for non-technical users. However, existing Text-to-SQL semantic
parsers cannot achieve high enough accuracy in the cross-database setting to
allow good usability in practice. This work presents Turing, a NLDB system
toward bridging this gap. The cross-domain semantic parser of Turing with our
novel value prediction method achieves execution accuracy, and
top-5 beam execution accuracy on the Spider validation set. To benefit
from the higher beam accuracy, we design an interactive system where the SQL
hypotheses in the beam are explained step-by-step in natural language, with
their differences highlighted. The user can then compare and judge the
hypotheses to select which one reflects their intention if any. The English
explanations of SQL queries in Turing are produced by our high-precision
natural language generation system based on synchronous grammars.Comment: ACL 2021 demonstration trac
MT-Teql: Evaluating and Augmenting Consistency of Text-to-SQL Models with Metamorphic Testing
Text-to-SQL is a task to generate SQL queries from human utterances. However,
due to the variation of natural language, two semantically equivalent
utterances may appear differently in the lexical level. Likewise, user
preferences (e.g., the choice of normal forms) can lead to dramatic changes in
table structures when expressing conceptually identical schemas. Envisioning
the general difficulty for text-to-SQL models to preserve prediction
consistency against linguistic and schema variations, we propose MT-Teql, a
Metamorphic Testing-based framework for systematically evaluating and
augmenting the consistency of TExt-to-SQL models. Inspired by the principles of
software metamorphic testing, MT-Teql delivers a model-agnostic framework which
implements a comprehensive set of metamorphic relations (MRs) to conduct
semantics-preserving transformations toward utterances and schemas. Model
Inconsistency can be exposed when the original and transformed inputs induce
different SQL queries. In addition, we leverage the transformed inputs to
retrain models for further model robustness boost. Our experiments show that
our framework exposes thousands of prediction errors from SOTA models and
enriches existing datasets by order of magnitude, eliminating over 40%
inconsistency errors without compromising standard accuracy
Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing
We present BRIDGE, a powerful sequential architecture for modeling
dependencies between natural language questions and relational databases in
cross-DB semantic parsing. BRIDGE represents the question and DB schema in a
tagged sequence where a subset of the fields are augmented with cell values
mentioned in the question. The hybrid sequence is encoded by BERT with minimal
subsequent layers and the text-DB contextualization is realized via the
fine-tuned deep attention in BERT. Combined with a pointer-generator decoder
with schema-consistency driven search space pruning, BRIDGE attained
state-of-the-art performance on popular cross-DB text-to-SQL benchmarks, Spider
(71.1\% dev, 67.5\% test with ensemble model) and WikiSQL (92.6\% dev, 91.9\%
test). Our analysis shows that BRIDGE effectively captures the desired
cross-modal dependencies and has the potential to generalize to more text-DB
related tasks. Our implementation is available at
\url{https://github.com/salesforce/TabularSemanticParsing}.Comment: EMNLP Findings 2020 long paper extended; 23 page