Search CORE

27 research outputs found

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Author: Jiang Zhengbao
Neubig Graham
Vasilescu Bogdan
Xu Frank F.
Yin Pengcheng
Publication venue
Publication date: 01/01/2020
Field of study

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.Comment: Accepted by ACL 202

arXiv.org e-Print Archive

Crossref

WebArena: A Realistic Web Environment for Building Autonomous Agents

Author: Alon Uri
Bisk Yonatan
Cheng Xianyi
Fried Daniel
Lo Robert
Neubig Graham
Ou Tianyue
Sridhar Abishek
Xu Frank F.
Zhou Shuyan
Zhou Xuhui
Zhu Hao
Publication venue
Publication date: 24/10/2023
Field of study

With advances in generative AI, there is now potential for autonomous agents to manage daily tasks via natural language commands. However, current agents are primarily created and tested in simplified synthetic environments, leading to a disconnect with real-world scenarios. In this paper, we build an environment for language-guided agents that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and designed to emulate tasks that humans routinely perform on the internet. We experiment with several baseline agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 14.41%, significantly lower than the human performance of 78.24%. These results highlight the need for further development of robust agents, that current state-of-the-art large language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress.Comment: Our code, data, environment reproduction resources, and video demonstrations are publicly available at https://webarena.dev

arXiv.org e-Print Archive

Neutrino Oscillations and the Supernova 1987A Signal

Author: A. Burrows
A. Yu. Smirnov
A. Yu. Smirnov
Beat Jegerlehner
C. B. Bratton
D. Nötzold
E. S. Myra
Frank Neubig
G. Raffelt
Georg Raffelt
H. Minakata
H. Minakata
H.-T. Janka
H.-T. Janka
H.-T. Janka
H.-T. Janka
H.-T. Janka
H.-T. Janka
J. Arafune
J. Arafune
K. S. Hirata
K. S. Hirata
K. Sato
L. Wolfenstein
M. G. Kendall
N. Hata
P. I. Krastev
P. J. Kernan
P. M. Giovanoni
P. O. Lagage
R. M. Bionta
R. Mayle
S. P. Mikheev
S. P. Rosen
S. W. Bruenn
T. J. Loredo
T. K. Kuo
T. P. Walker
V. Barger
W. T. Eadie
Publication venue: 'American Physical Society (APS)'
Publication date: 22/01/1996
Field of study

We study the impact of neutrino oscillations on the interpretation of the supernova (SN) 1987A neutrino signal by means of a maximum-likelihood analysis. We focus on oscillations between

\overline\nu_e

with

\overline\nu_\mu

\overline\nu_\tau

with those mixing parameters that would solve the solar neutrino problem. For the small-angle MSW solution (

\Delta m^2\approx10^{-5}\,\rm eV^2

\sin^22\Theta_0\approx0.007

), there are no significant oscillation effects on the Kelvin-Helmholtz cooling signal; we confirm previous best-fit values for the neutron-star binding energy and average spectral

\overline\nu_e

temperature. There is only marginal overlap between the upper end of the 95.4\% CL inferred range of

\langle E_{\overline\nu_e}\rangle

and the lower end of the range of theoretical predictions. Any admixture of the stiffer

\overline\nu_\mu

spectrum by oscillations aggravates the conflict between experimentally inferred and theoretically predicted spectral properties. For mixing parameters in the neighborhood of the large-angle MSW solution (

\Delta m^2\approx10^{-5}\,\rm eV^2

\sin^22\Theta_0\approx0.7

) the oscillations in the SN are adiabatic, but one needs to include the regeneration effect in the Earth which causes the Kamiokande and IMB detectors to observe different

\overline\nu_e

spectra. For the solar vacuum solution (

\Delta m^2\approx10^{-10}\,\rm eV^2

\sin^22\Theta_0\approx1

) the oscillations in the SN are nonadiabatic; vacuum oscillations take place between the SN and the detector. If either of the large-angle solutions were borne out by the upcoming round of solar neutrino experiments, one would have to conclude that the SN~1987A

\overline\nu_\mu

and/or

\overline\nu_e

spectra had been much softer than predicted by currentComment: Final version with very minor wording changes, to be published in Phys. Rev.

arXiv.org e-Print Archive

Crossref

CERN Document Server

A Systematic Evaluation of Large Language Models of Code

Author: Alon Uri
Hellendoorn Vincent J.
Neubig Graham
Xu Frank F.
Publication venue
Publication date: 04/05/2022
Field of study

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. Although Codex itself is not open-source, we find that existing open-source models do achieve close results in some programming languages, although targeted mainly for natural language modeling. We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine. In the C programming language, PolyCoder outperforms all models including Codex. Our trained models are open-source and publicly available at https://github.com/VHellendoorn/Code-LMs, which enables future research and application in this area.Comment: DL4C@ICLR 2022, and MAPS@PLDI 202

arXiv.org e-Print Archive

Active Retrieval Augmented Generation

Author: Callan Jamie
Dwivedi-Yu Jane
Gao Luyu
Jiang Zhengbao
Liu Qian
Neubig Graham
Sun Zhiqing
Xu Frank F.
Yang Yiming
Publication venue
Publication date: 11/05/2023
Field of study

Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval-augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout the generation process is essential. There have been some past efforts to retrieve information multiple times while generating outputs, which mostly retrieve documents at fixed intervals using the previous context as queries. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic retrieval-augmented generation method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE

arXiv.org e-Print Archive