6 research outputs found
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought
Recently Large Language Models (LLMs) have been proven to have strong
abilities in various domains and tasks. We study the problem of prompt
designing in the text-to-SQL task and attempt to improve the LLMs' reasoning
ability when generating SQL queries. Besides the trivial few-shot in-context
learning setting, we design our chain-of-thought (CoT) prompt with a similar
method to schema linking. We provide a method named ACT-SQL to automatically
generate auto-CoT exemplars and thus the whole process doesn't need manual
labeling. Our approach is cost-saving since we only use the LLMs' API call once
when generating one SQL query. Furthermore, we extend our in-context learning
method to the multi-turn text-to-SQL task. The experiment results show that the
LLMs' performance can benefit from our ACT-SQL approach. Our approach achieves
SOTA performance on the Spider dev set among existing in-context learning
approaches
ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL
Text-to-SQL aims to generate an executable SQL program given the user
utterance and the corresponding database schema. To ensure the well-formedness
of output SQLs, one prominent approach adopts a grammar-based recurrent decoder
to produce the equivalent SQL abstract syntax tree (AST). However, previous
methods mainly utilize an RNN-series decoder, which 1) is time-consuming and
inefficient and 2) introduces very few structure priors. In this work, we
propose an AST structure-aware Transformer decoder (ASTormer) to replace
traditional RNN cells. The structural knowledge, such as node types and
positions in the tree, is seamlessly incorporated into the decoder via both
absolute and relative position embeddings. Besides, the proposed framework is
compatible with different traversing orders even considering adaptive node
selection. Extensive experiments on five text-to-SQL benchmarks demonstrate the
effectiveness and efficiency of our structured decoder compared to competitive
baselines
A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames
Previous work on spoken language understanding (SLU) mainly focuses on
single-intent settings, where each input utterance merely contains one user
intent. This configuration significantly limits the surface form of user
utterances and the capacity of output semantics. In this work, we first propose
a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue
System, called MIVS. The target semantic frame is organized in a 3-layer
hierarchical structure to tackle the alignment and assignment problems in
multi-intent cases. Accordingly, we devise a BiRGAT model to encode the
hierarchy of ontology items, the backbone of which is a dual relational graph
attention network. Coupled with the 3-way pointer-generator decoder, our method
outperforms traditional sequence labeling and classification-based schemes by a
large margin
On the Structural Generalization in Text-to-SQL
Exploring the generalization of a text-to-SQL parser is essential for a
system to automatically adapt the real-world databases. Previous works provided
investigations focusing on lexical diversity, including the influence of the
synonym and perturbations in both natural language questions and databases.
However, research on the structure variety of database schema~(DS) is
deficient. Specifically, confronted with the same input question, the target
SQL is probably represented in different ways when the DS comes to a different
structure. In this work, we provide in-deep discussions about the structural
generalization of text-to-SQL tasks. We observe that current datasets are too
templated to study structural generalization. To collect eligible test data, we
propose a framework to generate novel text-to-SQL data via automatic and
synchronous (DS, SQL) pair altering. In the experiments, significant
performance reduction when evaluating well-trained text-to-SQL models on the
synthetic samples demonstrates the limitation of current research regarding
structural generalization. According to comprehensive analysis, we suggest the
practical reason is the overfitting of (NL, SQL) patterns.Comment: The experiment results of T5 and T5-Picard in Table 5 and Table 6 are
not correct because we made mistakes in the evaluation code