Search CORE

22 research outputs found

Variational Inference for Learning Representations of Natural Language Edits

Author: Marrese-Taylor Edison
Matsuo Yutaka
Reid Machel
Publication venue
Publication date: 03/01/2021
Field of study

Document editing has become a pervasive component of the production of information, with version control systems enabling edits to be efficiently stored and applied. In light of this, the task of learning distributed representations of edits has been recently proposed. With this in mind, we propose a novel approach that employs variational inference to learn a continuous latent space of vector representations to capture the underlying semantic information with regard to the document editing process. We achieve this by introducing a latent variable to explicitly model the aforementioned features. This latent variable is then combined with a document representation to guide the generation of an edited version of this document. Additionally, to facilitate standardized automatic evaluation of edit representations, which has heavily relied on direct human input thus far, we also propose a suite of downstream tasks, PEER, specifically designed to measure the quality of edit representations in the context of natural language processing.Comment: Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21

arXiv.org e-Print Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Association for the Advancement of Artificial Intelligence: AAAI Publications

Large Language Models are Zero-Shot Reasoners

Author: Gu Shixiang Shane
Iwasawa Yusuke
Kojima Takeshi
Matsuo Yutaka
Reid Machel
Publication venue
Publication date: 02/10/2022
Field of study

Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding "Let's think step by step" before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with 175B parameter InstructGPT model, as well as similar magnitudes of improvements with another off-the-shelf large model, 540B parameter PaLM. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted by simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.Comment: Accepted to NeurIPS2022. Our code is available at https://github.com/kojima-takeshi188/zero_shot_co

arXiv.org e-Print Archive

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

Author: Asai Akari
Blevins Terra
Gonen Hila
Hajishirzi Hannaneh
Kudugunta Sneha
Reid Machel
Ruder Sebastian
Tsvetkov Yulia
Yu Xinyan Velocity
Publication venue
Publication date: 24/05/2023
Field of study

Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.Comment: The data and code is available at https://buffetfs.github.io

arXiv.org e-Print Archive

The Upper Aptian to Lower Albian syn-rift carbonate succession of the southern Maestrat Basin (Spain): Facies architecture and fault-controlled stratabound dolostones

Author: A. Teixell
A. Travé
Allan
Allen
Alsharhan
Antolín-Tomás
Ayoub-Hannaa
Bond
Bover-Arnal
Bover-Arnal
Bover-Arnal
Bover-Arnal
Bover-Arnal
Bover-Arnal
Burchette
Caja
Caja
Canerot
Catuneanu
Climent-Domenech
Davies
Droste
Duggan
E. Gomez-Rivas
Embry
Embry
Fenerci-Masse
García-Mondéjar
Gomez-Rivas
Gomez-Rivas
Gradstein
Grandia
Grandia
Gregg
Gregg
Hardenbol
Hartshorne
Hillgärtner
J. Vergés
J.A. Moreno-Bedmar
J.D. Martín-Martín
Jones
Koehrer
Liesa
M. Corbella
Machel
Martín-Martín
Martín-Martín
Martín-Martín
Martín-Martín
Martínez
Masse
Masse
Mohseni
Moreno-Bedmar
Moreno-Bedmar
Moreno-Bedmar
Moreno-Bedmar
Obis
Ogg
Pittet
Pomar
R. Salas
Reboulet
Reid
Riding
Roca
Roca
S. Tomás
S.L. Stafford
Salas
Salas
Salas
Salas
Salas
Salas
Schlagintweit
Schmoker
Schroeder
Sclater
Sharp
Skelton
Spötl
Stein
Strasser
T. Bover-Arnal
Tomás
Tomás
Tritlla
Tritlla
Tucker
Vilas
Vilas
Warren
Watts
Wilson
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Learning to Model Editing Processes

Author: Neubig Graham
Reid Machel
Publication venue
Publication date: 24/05/2022
Field of study

Most existing sequence generation models produce outputs in one pass, usually left-to-right. However, this is in contrast with a more natural approach that humans use in generating content; iterative refinement and editing. Recent work has introduced edit-based models for various tasks (such as neural machine translation and text style transfer), but these generally model a single edit step. In this work, we propose modeling editing processes, modeling the whole process of iteratively generating sequences. We form a conceptual framework to describe the likelihood of multi-step edits, and describe neural models that can learn a generative model of sequences based on these multistep edits. We introduce baseline results and metrics on this task, finding that modeling editing processes improves performance on a variety of axes on both our proposed task and related downstream tasks compared to previous single-step models of edits

arXiv.org e-Print Archive

Can Wikipedia Help Offline Reinforcement Learning?

Author: Gu Shixiang Shane
Reid Machel
Yamada Yutaro
Publication venue
Publication date: 23/07/2022
Field of study

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains

arXiv.org e-Print Archive

Nature and distribution of diagenetic phases and petrophysical properties of carbonates: The Mississippian Madison Formation (Bighorn Basin, Wyoming, USA)

Author: Allan
Bakker
Bakker
Banner
Barbier
Barbier
Barbier
Beaudoin
Behrens
Bellahsen
Berner
Blakey
Bodnar
Booler
Brown
Brown
Budai
Budai
Caputo
Carballo
Chen
Dalmasso
Dickson
Dunham
Elrick
Embry
Erslev
Frank
Friedman
Goldstein
Gradstein
Gutschick
Hennier
Hillgärtner
Hudson
Hunt
Jean-Paul Callot
Katz
Katz
Keefer
Krumgalz
Land
Lavenu
Machel
Machel
Mamet
Marc Floquet
Maughan
Mazzullo
Mickael Barbier
Mii
Montaggioni
Montañez
Moore
Morad
Neely
Palma
Paylor
Perthuisot
Peterson
Poole
Posamentier
Prokoph
Purser
Purser
Rameil
Reid
Roduit
Rosenbaum
Saller
Saltzman
Sandberg
Sando
Sando
Sando
Sando
Sando
Sando
Schlager
Sibley
Smith
Smith
Sonnenfeld
Stanton
Stone
Strasser
Taylor
Tucker
Tucker
Vail
Veevers
Veizer
Veizer
Veizer
Wachter
Walker
Westphal
Youri Hamon
Zempolich
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref