127 research outputs found
Enabling Large Language Models to Learn from Rules
Large language models (LLMs) have shown incredible performance in completing
various real-world tasks. The current knowledge learning paradigm of LLMs is
mainly based on learning from examples, in which LLMs learn the internal rule
implicitly from a certain number of supervised examples. However, the learning
paradigm may not well learn those complicated rules, especially when the
training examples are limited. We are inspired that humans can learn the new
tasks or knowledge in another way by learning from rules. That is, humans can
grasp the new tasks or knowledge quickly and generalize well given only a
detailed rule and a few optional examples. Therefore, in this paper, we aim to
explore the feasibility of this new learning paradigm, which encodes the
rule-based knowledge into LLMs. We propose rule distillation, which first uses
the strong in-context abilities of LLMs to extract the knowledge from the
textual rules and then explicitly encode the knowledge into LLMs' parameters by
learning from the above in-context signals produced inside the model. Our
experiments show that making LLMs learn from rules by our method is much more
efficient than example-based learning in both the sample size and
generalization ability.Comment: In progres
Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information
Non-autoregressive neural machine translation (NAT) generates each target
word in parallel and has achieved promising inference acceleration. However,
existing NAT models still have a big gap in translation quality compared to
autoregressive neural machine translation models due to the enormous decoding
space. To address this problem, we propose a novel NAT framework named
ReorderNAT which explicitly models the reordering information in the decoding
procedure. We further introduce deterministic and non-deterministic decoding
strategies that utilize reordering information to narrow the decoding search
space in our proposed ReorderNAT. Experimental results on various widely-used
datasets show that our proposed model achieves better performance compared to
existing NAT models, and even achieves comparable translation quality as
autoregressive translation models with a significant speedup.Comment: Accepted by AAAI 202
NumNet: Machine Reading Comprehension with Numerical Reasoning
Numerical reasoning, such as addition, subtraction, sorting and counting is a
critical skill in human's reading comprehension, which has not been well
considered in existing machine reading comprehension (MRC) systems. To address
this issue, we propose a numerical MRC model named as NumNet, which utilizes a
numerically-aware graph neural network to consider the comparing information
and performs numerical reasoning over numbers in the question and passage. Our
system achieves an EM-score of 64.56% on the DROP dataset, outperforming all
existing machine reading comprehension models by considering the numerical
relations among numbers.Comment: Accepted to EMNLP2019; 11 pages, 2 figures, 6 table
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
Existing reference-free metrics have obvious limitations for evaluating
controlled text generation models. Unsupervised metrics can only provide a
task-agnostic evaluation result which correlates weakly with human judgments,
whereas supervised ones may overfit task-specific data with poor generalization
ability to other datasets. In this paper, we propose an unsupervised
reference-free metric called CTRLEval, which evaluates controlled text
generation from different aspects by formulating each aspect into multiple text
infilling tasks. On top of these tasks, the metric assembles the generation
probabilities from a pre-trained language model without any model training.
Experimental results show that our metric has higher correlations with human
judgments than other baselines, while obtaining better generalization of
evaluating generated texts from different models and with different qualities.Comment: Accepted by ACL 2022 (Main Conference
Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View
Graph Neural Networks (GNNs) have achieved promising performance on a wide
range of graph-based tasks. Despite their success, one severe limitation of
GNNs is the over-smoothing issue (indistinguishable representations of nodes in
different classes). In this work, we present a systematic and quantitative
study on the over-smoothing issue of GNNs. First, we introduce two quantitative
metrics, MAD and MADGap, to measure the smoothness and over-smoothness of the
graph nodes representations, respectively. Then, we verify that smoothing is
the nature of GNNs and the critical factor leading to over-smoothness is the
low information-to-noise ratio of the message received by the nodes, which is
partially determined by the graph topology. Finally, we propose two methods to
alleviate the over-smoothing issue from the topological view: (1) MADReg which
adds a MADGap-based regularizer to the training objective;(2) AdaGraph which
optimizes the graph topology based on the model predictions. Extensive
experiments on 7 widely-used graph datasets with 10 typical GNN models show
that the two proposed methods are effective for relieving the over-smoothing
issue, thus improving the performance of various GNN models.Comment: Accepted by AAAI 2020. This complete version contains the appendi
Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning
Parameter-efficient tuning methods (PETs) have achieved promising results in
tuning large pre-trained language models (PLMs). By formalizing frozen PLMs and
additional tunable parameters as systems and controls respectively, PETs can be
theoretically grounded to optimal control and further viewed as optimizing the
terminal cost and running cost in the optimal control literature. Despite the
elegance of this theoretical grounding, in practice, existing PETs often ignore
the running cost and only optimize the terminal cost, i.e., focus on optimizing
the loss function of the output state, regardless of the running cost that
depends on the intermediate states. Since it is non-trivial to directly model
the intermediate states and design a running cost function, we propose to use
latent stochastic bridges to regularize the intermediate states and use the
regularization as the running cost of PETs. As the first work to propose
regularized PETs that use stochastic bridges as the regularizers (running
costs) for the intermediate states, we show the effectiveness and generality of
this regularization across different tasks, PLMs and PETs. In view of the great
potential and capacity, we believe more sophisticated regularizers can be
designed for PETs and better performance can be achieved in the future. The
code is released at
\url{https://github.com/thunlp/stochastic-bridge-pet/tree/main}.Comment: ACL 2023 Finding
Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models
Prompting, which casts downstream applications as language modeling tasks,
has shown to be sample efficient compared to standard fine-tuning with
pre-trained models. However, one pitfall of prompting is the need of
manually-designed patterns, whose outcome can be unintuitive and requires large
validation sets to tune. To tackle the challenge, we propose AutoSeq, a fully
automatic prompting method: (1) We adopt natural language prompts on
sequence-to-sequence models, enabling free-form generation and larger label
search space; (2) We propose label sequences -- phrases with indefinite lengths
to verbalize the labels -- which eliminate the need of manual templates and are
more expressive than single label words; (3) We use beam search to
automatically generate a large amount of label sequence candidates and propose
contrastive re-ranking to get the best combinations. AutoSeq significantly
outperforms other no-manual-design methods, such as soft prompt tuning, adapter
tuning, and automatic search on single label words; the generated label
sequences are even better than curated manual ones on a variety of tasks. Our
method reveals the potential of sequence-to-sequence models in few-shot
learning and sheds light on a path to generic and automatic prompting. The
source code of this paper can be obtained from
https://github.com/thunlp/Seq2Seq-Prompt.Comment: Accepted to COLING 202
- …