31 research outputs found
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought
Recently Large Language Models (LLMs) have been proven to have strong
abilities in various domains and tasks. We study the problem of prompt
designing in the text-to-SQL task and attempt to improve the LLMs' reasoning
ability when generating SQL queries. Besides the trivial few-shot in-context
learning setting, we design our chain-of-thought (CoT) prompt with a similar
method to schema linking. We provide a method named ACT-SQL to automatically
generate auto-CoT exemplars and thus the whole process doesn't need manual
labeling. Our approach is cost-saving since we only use the LLMs' API call once
when generating one SQL query. Furthermore, we extend our in-context learning
method to the multi-turn text-to-SQL task. The experiment results show that the
LLMs' performance can benefit from our ACT-SQL approach. Our approach achieves
SOTA performance on the Spider dev set among existing in-context learning
approaches
Collaborative Group Learning
Collaborative learning has successfully applied knowledge transfer to guide a
pool of small student networks towards robust local minima. However, previous
approaches typically struggle with drastically aggravated student
homogenization when the number of students rises. In this paper, we propose
Collaborative Group Learning, an efficient framework that aims to diversify the
feature representation and conduct an effective regularization. Intuitively,
similar to the human group study mechanism, we induce students to learn and
exchange different parts of course knowledge as collaborative groups. First,
each student is established by randomly routing on a modular neural network,
which facilitates flexible knowledge communication between students due to
random levels of representation sharing and branching. Second, to resist the
student homogenization, students first compose diverse feature sets by
exploiting the inductive bias from sub-sets of training data, and then
aggregate and distill different complementary knowledge by imitating a random
sub-group of students at each time step. Overall, the above mechanisms are
beneficial for maximizing the student population to further improve the model
generalization without sacrificing computational efficiency. Empirical
evaluations on both image and text tasks indicate that our method significantly
outperforms various state-of-the-art collaborative approaches whilst enhancing
computational efficiency.Comment: Accepted by AAAI 2021; Camera ready versio
Large Language Models Are Semi-Parametric Reinforcement Learning Agents
Inspired by the insights in cognitive science with respect to human memory
and reasoning mechanism, a novel evolvable LLM-based (Large Language Model)
agent framework is proposed as REMEMBERER. By equipping the LLM with a
long-term experience memory, REMEMBERER is capable of exploiting the
experiences from the past episodes even for different task goals, which excels
an LLM-based agent with fixed exemplars or equipped with a transient working
memory. We further introduce Reinforcement Learning with Experience Memory
(RLEM) to update the memory. Thus, the whole system can learn from the
experiences of both success and failure, and evolve its capability without
fine-tuning the parameters of the LLM. In this way, the proposed REMEMBERER
constitutes a semi-parametric RL agent. Extensive experiments are conducted on
two RL task sets to evaluate the proposed framework. The average results with
different initialization and training sets exceed the prior SOTA by 4% and 2%
for the success rate on two task sets and demonstrate the superiority and
robustness of REMEMBERER
ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL
Text-to-SQL aims to generate an executable SQL program given the user
utterance and the corresponding database schema. To ensure the well-formedness
of output SQLs, one prominent approach adopts a grammar-based recurrent decoder
to produce the equivalent SQL abstract syntax tree (AST). However, previous
methods mainly utilize an RNN-series decoder, which 1) is time-consuming and
inefficient and 2) introduces very few structure priors. In this work, we
propose an AST structure-aware Transformer decoder (ASTormer) to replace
traditional RNN cells. The structural knowledge, such as node types and
positions in the tree, is seamlessly incorporated into the decoder via both
absolute and relative position embeddings. Besides, the proposed framework is
compatible with different traversing orders even considering adaptive node
selection. Extensive experiments on five text-to-SQL benchmarks demonstrate the
effectiveness and efficiency of our structured decoder compared to competitive
baselines
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
The growing prevalence of visually rich documents, such as webpages and
scanned/digital-born documents (images, PDFs, etc.), has led to increased
interest in automatic document understanding and information extraction across
academia and industry. Although various document modalities, including image,
text, layout, and structure, facilitate human information retrieval, the
interconnected nature of these modalities presents challenges for neural
networks. In this paper, we introduce WebLM, a multimodal pre-training network
designed to address the limitations of solely modeling text and structure
modalities of HTML in webpages. Instead of processing document images as
unified natural images, WebLM integrates the hierarchical structure of document
images to enhance the understanding of markup-language-based documents.
Additionally, we propose several pre-training tasks to model the interaction
among text, structure, and image modalities effectively. Empirical results
demonstrate that the pre-trained WebLM significantly surpasses previous
state-of-the-art pre-trained models across several webpage understanding tasks.
The pre-trained models and code are available at
https://github.com/X-LANCE/weblm
A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames
Previous work on spoken language understanding (SLU) mainly focuses on
single-intent settings, where each input utterance merely contains one user
intent. This configuration significantly limits the surface form of user
utterances and the capacity of output semantics. In this work, we first propose
a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue
System, called MIVS. The target semantic frame is organized in a 3-layer
hierarchical structure to tackle the alignment and assignment problems in
multi-intent cases. Accordingly, we devise a BiRGAT model to encode the
hierarchy of ontology items, the backbone of which is a dual relational graph
attention network. Coupled with the 3-way pointer-generator decoder, our method
outperforms traditional sequence labeling and classification-based schemes by a
large margin
Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation
Conversational recommendation systems (CRS) effectively address information
asymmetry by dynamically eliciting user preferences through multi-turn
interactions. Existing CRS widely assumes that users have clear preferences.
Under this assumption, the agent will completely trust the user feedback and
treat the accepted or rejected signals as strong indicators to filter items and
reduce the candidate space, which may lead to the problem of over-filtering.
However, in reality, users' preferences are often vague and volatile, with
uncertainty about their desires and changing decisions during interactions.
To address this issue, we introduce a novel scenario called Vague Preference
Multi-round Conversational Recommendation (VPMCR), which considers users' vague
and volatile preferences in CRS.VPMCR employs a soft estimation mechanism to
assign a non-zero confidence score for all candidate items to be displayed,
naturally avoiding the over-filtering problem. In the VPMCR setting, we
introduce an solution called Adaptive Vague Preference Policy Learning (AVPPL),
which consists of two main components: Uncertainty-aware Soft Estimation (USE)
and Uncertainty-aware Policy Learning (UPL). USE estimates the uncertainty of
users' vague feedback and captures their dynamic preferences using a
choice-based preferences extraction module and a time-aware decaying strategy.
UPL leverages the preference distribution estimated by USE to guide the
conversation and adapt to changes in users' preferences to make recommendations
or ask for attributes.
Our extensive experiments demonstrate the effectiveness of our method in the
VPMCR scenario, highlighting its potential for practical applications and
improving the overall performance and applicability of CRS in real-world
settings, particularly for users with vague or dynamic preferences