53 research outputs found
Improving Text-to-SQL Evaluation Methodology
To be informative, an evaluation must measure how well systems generalize to
realistic unseen data. We identify limitations of and propose improvements to
current evaluations of text-to-SQL systems. First, we compare human-generated
and automatically generated questions, characterizing properties of queries
necessary for real-world applications. To facilitate evaluation on multiple
datasets, we release standardized and improved versions of seven existing
datasets and one new text-to-SQL dataset. Second, we show that the current
division of data into training and test sets measures robustness to variations
in the way questions are asked, but only partially tests how well systems
generalize to new queries; therefore, we propose a complementary dataset split
for evaluation of future work. Finally, we demonstrate how the common practice
of anonymizing variables during evaluation removes an important challenge of
the task. Our observations highlight key difficulties, and our methodology
enables effective measurement of future development.Comment: To appear at ACL 201
Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
Research on parsing language to SQL has largely ignored the structure of the
database (DB) schema, either because the DB was very simple, or because it was
observed at both training and test time. In Spider, a recently-released
text-to-SQL dataset, new and complex DBs are given at test time, and so the
structure of the DB schema can inform the predicted SQL query. In this paper,
we present an encoder-decoder semantic parser, where the structure of the DB
schema is encoded with a graph neural network, and this representation is later
used at both encoding and decoding time. Evaluation shows that encoding the
schema structure improves our parser accuracy from 33.8% to 39.4%, dramatically
above the current state of the art, which is at 19.7%.Comment: Accepted as a short paper at ACL 201
Good-Enough Compositional Data Augmentation
We propose a simple data augmentation protocol aimed at providing a
compositional inductive bias in conditional and unconditional sequence models.
Under this protocol, synthetic training examples are constructed by taking real
training examples and replacing (possibly discontinuous) fragments with other
fragments that appear in at least one similar environment. The protocol is
model-agnostic and useful for a variety of tasks. Applied to neural
sequence-to-sequence models, it reduces error rate by as much as 87% on
diagnostic tasks from the SCAN dataset and 16% on a semantic parsing task.
Applied to n-gram language models, it reduces perplexity by roughly 1% on small
corpora in several languages
IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles
We present a sequence-to-action parsing approach for the natural language to
SQL task that incrementally fills the slots of a SQL query with feasible
actions from a pre-defined inventory. To account for the fact that typically
there are multiple correct SQL queries with the same or very similar semantics,
we draw inspiration from syntactic parsing techniques and propose to train our
sequence-to-action models with non-deterministic oracles. We evaluate our
models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the
test set, a 2.1% absolute improvement over the models trained with traditional
static oracles assuming a single correct target SQL query. When further
combined with the execution-guided decoding strategy, our model sets a new
state-of-the-art performance at an execution accuracy of 87.1%
Dependency-based Hybrid Trees for Semantic Parsing
We propose a novel dependency-based hybrid tree model for semantic parsing,
which converts natural language utterance into machine interpretable meaning
representations. Unlike previous state-of-the-art models, the semantic
information is interpreted as the latent dependency between the natural
language words in our joint representation. Such dependency information can
capture the interactions between the semantics and natural language words. We
integrate a neural component into our model and propose an efficient
dynamic-programming algorithm to perform tractable inference. Through extensive
experiments on the standard multilingual GeoQuery dataset with eight languages,
we demonstrate that our proposed approach is able to achieve state-of-the-art
performance across several languages. Analysis also justifies the effectiveness
of using our new dependency-based representation.Comment: Accepted by EMNLP 201
Semantic Evaluation for Text-to-SQL with Distilled Test Suites
We propose test suite accuracy to approximate semantic accuracy for
Text-to-SQL models. Our method distills a small test suite of databases that
achieves high code coverage for the gold query from a large number of randomly
generated databases. At evaluation time, it computes the denotation accuracy of
the predicted queries on the distilled test suite, hence calculating a tight
upper-bound for semantic accuracy efficiently. We use our proposed method to
evaluate 21 models submitted to the Spider leader board and manually verify
that our method is always correct on 100 examples. In contrast, the current
Spider metric leads to a 2.5% false negative rate on average and 8.1% in the
worst case, indicating that test suite accuracy is needed. Our implementation,
along with distilled test suites for eleven Text-to-SQL datasets, is publicly
available.Comment: EMNLP 2020 Long Pape
Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
We present a neural approach called IRNet for complex and cross-domain
Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between
intents expressed in natural language (NL) and the implementation details in
SQL; 2) the challenge in predicting columns caused by the large number of
out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet
decomposes the synthesis process into three phases. In the first phase, IRNet
performs a schema linking over a question and a database schema. Then, IRNet
adopts a grammar-based neural model to synthesize a SemQL query which is an
intermediate representation that we design to bridge NL and SQL. Finally, IRNet
deterministically infers a SQL query from the synthesized SemQL query with
domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet
achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous
state-of-the-art approaches. At the time of writing, IRNet achieves the first
position on the Spider leaderboard.Comment: To appear in ACL 201
An Investigation Between Schema Linking and Text-to-SQL Performance
Text-to-SQL is a crucial task toward developing methods for understanding
natural language by computers. Recent neural approaches deliver excellent
performance; however, models that are difficult to interpret inhibit future
developments. Hence, this study aims to provide a better approach toward the
interpretation of neural models. We hypothesize that the internal behavior of
models at hand becomes much easier to analyze if we identify the detailed
performance of schema linking simultaneously as the additional information of
the text-to-SQL performance. We provide the ground-truth annotation of schema
linking information onto the Spider dataset. We demonstrate the usefulness of
the annotated data and how to analyze the current state-of-the-art neural
models
Quda: Natural Language Queries for Visual Data Analytics
Visualization-oriented natural language interfaces (V-NLIs) have been
explored and developed in recent years. One challenge faced by V-NLIs is in the
formation of effective design decisions that usually requires a deep
understanding of user queries. Learning-based approaches have shown potential
in V-NLIs and reached state-of-the-art performance in various NLP tasks.
However, because of the lack of sufficient training samples that cater to
visual data analytics, cutting-edge techniques have rarely been employed to
facilitate the development of V-NLIs. We present a new dataset, called Quda, to
help V-NLIs understand free-form natural language. Our dataset contains 14;035
diverse user queries annotated with 10 low-level analytic tasks that assist in
the deployment of state-of-the-art techniques for parsing complex human
language. We achieve this goal by first gathering seed queries with data
analysts who are target users of V-NLIs. Then we employ extensive crowd force
for paraphrase generation and validation. We demonstrate the usefulness of Quda
in building V-NLIs by creating a prototype that makes effective design
decisions for free-form user queries. We also show that Quda can be beneficial
for a wide range of applications in the visualization community by analyzing
the design tasks described in academic publications.Comment: This work isn't sufficiently exhaustive. We need to do some new work
on thi
Learning to Synthesize Data for Semantic Parsing
Synthesizing data for semantic parsing has gained increasing attention
recently. However, most methods require handcrafted (high-precision) rules in
their generative process, hindering the exploration of diverse unseen data. In
this work, we propose a generative model which features a (non-neural) PCFG
that models the composition of programs (e.g., SQL), and a BART-based
translation model that maps a program to an utterance. Due to the simplicity of
PCFG and pre-trained BART, our generative model can be efficiently learned from
existing data at hand. Moreover, explicitly modeling compositions using PCFG
leads to a better exploration of unseen programs, thus generate more diverse
data. We evaluate our method in both in-domain and out-of-domain settings of
text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider,
respectively. Our empirical results show that the synthesized data generated
from our model can substantially help a semantic parser achieve better
compositional and domain generalization.Comment: NAACL 2021 short pape
- …