22 research outputs found
Variational Inference for Learning Representations of Natural Language Edits
Document editing has become a pervasive component of the production of
information, with version control systems enabling edits to be efficiently
stored and applied. In light of this, the task of learning distributed
representations of edits has been recently proposed. With this in mind, we
propose a novel approach that employs variational inference to learn a
continuous latent space of vector representations to capture the underlying
semantic information with regard to the document editing process. We achieve
this by introducing a latent variable to explicitly model the aforementioned
features. This latent variable is then combined with a document representation
to guide the generation of an edited version of this document. Additionally, to
facilitate standardized automatic evaluation of edit representations, which has
heavily relied on direct human input thus far, we also propose a suite of
downstream tasks, PEER, specifically designed to measure the quality of edit
representations in the context of natural language processing.Comment: Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21
Large Language Models are Zero-Shot Reasoners
Pretrained large language models (LLMs) are widely used in many sub-fields of
natural language processing (NLP) and generally known as excellent few-shot
learners with task-specific exemplars. Notably, chain of thought (CoT)
prompting, a recent technique for eliciting complex multi-step reasoning
through step-by-step answer examples, achieved the state-of-the-art
performances in arithmetics and symbolic reasoning, difficult system-2 tasks
that do not follow the standard scaling laws for LLMs. While these successes
are often attributed to LLMs' ability for few-shot learning, we show that LLMs
are decent zero-shot reasoners by simply adding "Let's think step by step"
before each answer. Experimental results demonstrate that our Zero-shot-CoT,
using the same single prompt template, significantly outperforms zero-shot LLM
performances on diverse benchmark reasoning tasks including arithmetics
(MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin
Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled
Objects), without any hand-crafted few-shot examples, e.g. increasing the
accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with
175B parameter InstructGPT model, as well as similar magnitudes of improvements
with another off-the-shelf large model, 540B parameter PaLM. The versatility of
this single prompt across very diverse reasoning tasks hints at untapped and
understudied fundamental zero-shot capabilities of LLMs, suggesting high-level,
multi-task broad cognitive capabilities may be extracted by simple prompting.
We hope our work not only serves as the minimal strongest zero-shot baseline
for the challenging reasoning benchmarks, but also highlights the importance of
carefully exploring and analyzing the enormous zero-shot knowledge hidden
inside LLMs before crafting finetuning datasets or few-shot exemplars.Comment: Accepted to NeurIPS2022. Our code is available at
https://github.com/kojima-takeshi188/zero_shot_co
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer
Despite remarkable advancements in few-shot generalization in natural
language processing, most models are developed and evaluated primarily in
English. To facilitate research on few-shot cross-lingual transfer, we
introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across
54 languages in a sequence-to-sequence format and provides a fixed set of
few-shot examples and instructions. BUFFET is designed to establish a rigorous
and equitable evaluation framework for few-shot cross-lingual transfer across a
broad range of tasks and languages. Using BUFFET, we perform thorough
evaluations of state-of-the-art multilingual large language models with
different transfer methods, namely in-context learning and fine-tuning. Our
findings reveal significant room for improvement in few-shot in-context
cross-lingual transfer. In particular, ChatGPT with in-context learning often
performs worse than much smaller mT5-base models fine-tuned on English task
data and few-shot in-language examples. Our analysis suggests various avenues
for future research in few-shot cross-lingual transfer, such as improved
pretraining, understanding, and future evaluations.Comment: The data and code is available at https://buffetfs.github.io
Learning to Model Editing Processes
Most existing sequence generation models produce outputs in one pass, usually
left-to-right. However, this is in contrast with a more natural approach that
humans use in generating content; iterative refinement and editing. Recent work
has introduced edit-based models for various tasks (such as neural machine
translation and text style transfer), but these generally model a single edit
step. In this work, we propose modeling editing processes, modeling the whole
process of iteratively generating sequences. We form a conceptual framework to
describe the likelihood of multi-step edits, and describe neural models that
can learn a generative model of sequences based on these multistep edits. We
introduce baseline results and metrics on this task, finding that modeling
editing processes improves performance on a variety of axes on both our
proposed task and related downstream tasks compared to previous single-step
models of edits
Can Wikipedia Help Offline Reinforcement Learning?
Fine-tuning reinforcement learning (RL) models has been challenging because
of a lack of large scale off-the-shelf datasets as well as high variance in
transferability among different environments. Recent work has looked at
tackling offline RL from the perspective of sequence modeling with improved
results as result of the introduction of the Transformer architecture. However,
when the model is trained from scratch, it suffers from slow convergence
speeds. In this paper, we look to take advantage of this formulation of
reinforcement learning as sequence modeling and investigate the transferability
of pre-trained sequence models on other domains (vision, language) when
finetuned on offline RL tasks (control, games). To this end, we also propose
techniques to improve transfer between these domains. Results show consistent
performance gains in terms of both convergence speed and reward on a variety of
environments, accelerating training by 3-6x and achieving state-of-the-art
performance in a variety of tasks using Wikipedia-pretrained and GPT2 language
models. We hope that this work not only brings light to the potentials of
leveraging generic sequence modeling techniques and pre-trained models for RL,
but also inspires future work on sharing knowledge between generative modeling
tasks of completely different domains