Search CORE

47 research outputs found

Teaching Large Language Models to Self-Debug

Author: Chen Xinyun
Lin Maxwell
Schärli Nathanael
Zhou Denny
Publication venue
Publication date: 11/04/2023
Field of study

Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose Self-Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations. In particular, we demonstrate that Self-Debugging can teach the large language model to perform rubber duck debugging; i.e., without any feedback on the code correctness or error messages, the model is able to identify its mistakes by explaining the generated code in natural language. Self-Debugging achieves the state-of-the-art performance on several code generation benchmarks, including the Spider dataset for text-to-SQL generation, TransCoder for C++-to-Python translation, and MBPP for text-to-Python generation. On the Spider benchmark where there are no unit tests to verify the correctness of predictions, Self-Debugging with code explanation consistently improves the baseline by 2-3%, and improves the prediction accuracy on problems of the hardest label by 9%. On TransCoder and MBPP where unit tests are available, Self-Debugging improves the baseline accuracy by up to 12%. Meanwhile, by leveraging feedback messages and reusing failed predictions, Self-Debugging notably improves sample efficiency, and can match or outperform baseline models that generate more than 10x candidate programs

arXiv.org e-Print Archive

*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task

Author: Momchev Nikola
Scales Nathan
Schärli Nathanael
Sinopalnikov Danila
Tihon Tibor
Tsarkov Dmitry
Publication venue
Publication date: 15/12/2020
Field of study

We present *-CFQ ("star-CFQ"): a suite of large-scale datasets of varying scope based on the CFQ semantic parsing benchmark, designed for principled investigation of the scalability of machine learning systems in a realistic compositional task setting. Using this suite, we conduct a series of experiments investigating the ability of Transformers to benefit from increased training size under conditions of fixed computational cost. We show that compositional generalization remains a challenge at all training sizes, and we show that increasing the scope of natural language leads to consistently higher error rates, which are only partially offset by increased training data. We further show that while additional training data from a related domain improves the accuracy in data-starved situations, this improvement is limited and diminishes as the distance from the related domain to the target domain increases.Comment: Accepted, AAAI-2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Large Language Models Can Be Easily Distracted by Irrelevant Context

Author: Chen Xinyun
Chi Ed
Dohan David
Misra Kanishka
Scales Nathan
Schärli Nathanael
Shi Freda
Zhou Denny
Publication venue
Publication date: 13/02/2023
Field of study

Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context. In particular, we introduce Grade-School Math with Irrelevant Context (GSM-IC), an arithmetic reasoning dataset with irrelevant information in the problem description. We use this benchmark to measure the distractibility of cutting-edge prompting techniques for large language models, and find that the model performance is dramatically decreased when irrelevant information is included. We also identify several approaches for mitigating this deficiency, such as decoding with self-consistency and adding to the prompt an instruction that tells the language model to ignore the irrelevant information

arXiv.org e-Print Archive

Compositional Semantic Parsing with Large Language Models

Author: Akyürek Ekin
Bousquet Olivier
Chen Xinyun
Drozdov Andrew
Scales Nathan
Schärli Nathanael
Song Xinying
Zhou Denny
Publication venue
Publication date: 29/09/2022
Field of study

Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications.Comment: Fixed metadata. No other change

arXiv.org e-Print Archive

Virtualization Support for Dynamic Core Library Update

Author: Black BDN
Chisnall David
Denker Marcus
Guillermo Polito PDF
Marcus Denker DGL
Microsystems Sun
Polito Guillermo
Schärli Nathanael
Publication venue: HAL CCSD
Publication date: 21/06/2015
Field of study

International audienceDynamically updating language runtime and core libraries such as collections and threading is challenging since the update mechanism uses such libraries at the same time that it modifies them. To tackle this challenge, we present Dynamic Core Library Update (DCU) as an extension of Dynamic Software Update (DSU) and our approach based on a virtualization architecture. Our solution supports the update of core libraries as any other normal library, avoiding the circular dependencies between the updater and the core libraries. Our benchmarks show that there is no evident performance overhead in comparison with a default execution. Finally, we show that our approach can be applied to real life scenario by introducing a critical update inside a web application with 20 simulated concurrent users. Acknowledgments We thank the European Smalltalk User Group for their support (www.esug.org)

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Object-oriented encapsulation for dynamically typed languages

Author: Agesen O.
Almeida P. S.
Andrew P. Black
Arnold K.
Beck K.
Boyland J.
Bracha G.
Gamma E.
Goldberg A.
Nathanael Schärli
Noble J.
Stéphane Ducasse
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Traits: Composing Classes from Behavioral Building Blocks

Author: Schärli Nathanael
Publication venue
Publication date: 01/02/2005
Field of study

Inheritance is well-known and accepted as a fundamental mechanism for reuse in object-oriented languages. Unfortunately, the main variants —- single inheritance, multiple inheritance, and mixin inheritance —- all suffer from conceptual and practical problems related to software reuse and robustness with respect to changes. In a rst part of this thesis, we identify and illustrate these problems. To overcome these problems, we then present traits, a simple compositional model that extends single inheritance. A trait is essentially a (parameterized) set of methods; it serves as a behavioral building block for classes and is the primitive unit of code reuse. We develop a formal model of traits that establishes how traits can be composed to form other traits or classes, and we describe how we implemented traits in Squeak Smalltalk by bootstrapping a new language kernel. We present our experimental validation in which we apply traits to refactor parts of the Smalltalk kernel and library, and we develop a programming methodology around the usage of traits and the trait browser, the tool that we implemented to take full advantage of the availability of traits in the Squeak programming environment

Bern Open Repository and Information System (BORIS)

Object Encapsulation for Dynamically Typed Languages

Author: Andrew P. Black
Nathanael Schärli
Nathanael Schärli
Stéphane Ducasse
Publication venue
Publication date
Field of study

A version of this paper has been submitted to the 200

CiteSeerX