Search CORE

4 research outputs found

No Zombie Types: Liveness-Based Justification For Monotonic Gradual Types

Author: Zi Yangtian
Publication venue: 'University of Waterloo'
Publication date: 11/08/2021
Field of study

Gradual type systems with the monotonic dynamic semantics, such as HiggsCheck implementing SafeTypeScript, are able to achieve decent performance, making them a viable option for JavaScript programmers seeking run-time-checkable type annotations. However, the type restrictions for objects in the monotonic dynamic semantics are, as the name suggests, monotonic. Once a typed reference is defined or assigned to refer to an object, the contract carrying the type obligation of the reference is part of the object for the remainder of execution. In some cases, such contracts become "zombies": the reference that justifies a contract is out of scope, yet the object still retains the type obligation. In this thesis, we propose a novel idea of contract liveness and its implementation. Briefly speaking, contracts must be justified by live stack references defined with associated type obligations. Our implementation, taking inspiration from how garbage collectors approximate object liveness by reachability of objects, approximates contract liveness by reachability of contracts. Then, to achieve a much closer approximation to contract liveness, we introduce a poisoning process: we nullify the stack references justifying the violated contract, and associate the location that triggered the contract violation with a poisoned reference for blame. The implementation is compared with the original implementation of HiggsCheck. The comparison shows our system is fully compatible with code that raised no errors, with a small performance penalty of 8.14% average slowdown. We also discuss the performance of the contract removal process, and possible worst cases for the liveness-based system. Also, the semantics of HiggsCheck SafeTypeScript is modified to formalize the liveness-based type system. Our work proves that relaxations of contractual obligations in a gradually typed system with the monotonic semantics are viable and realistic

University of Waterloo's Institutional Repository

A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages

Author: Anderson Carolyn Jane
Cassano Federico
Feldman Molly Q
Gouwar John
Greenberg Michael
Guha Arjun
Jangda Abhinav
Nguyen Daniel
Nguyen Sydney
Phipps-Costin Luna
Pinckney Donald
Yee Ming-Ho
Zi Yangtian
Publication venue
Publication date: 08/11/2022
Field of study

Large language models have demonstrated the ability to condition on and generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We facilitate the exploration of this topic by proposing MultiPL-E, the first multi-language parallel benchmark for natural-language-to-code-generation. MultiPL-E extends the HumanEval benchmark (Chen et al, 2021) to support 18 more programming languages, encompassing a range of programming paradigms and popularity. We evaluate two state-of-the-art code generation models on MultiPL-E: Codex and InCoder. We find that on several languages, Codex matches and even exceeds its performance on Python. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible. We describe a general approach for easily adding support for new benchmarks and languages to MultiPL-E

arXiv.org e-Print Archive

MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation

Author: Anderson Carolyn Jane
Cassano Federico
Feldman Molly Q.
Gouwar John
Greenberg Michael
Guha Arjun
Jangda Abhinav
Nguyen Daniel
Nguyen Sydney
Phipps-Costin Luna
Pinckney Donald
Yee Ming-Ho
Zi Yangtian
Publication venue: Digital Commons at Oberlin
Publication date: 01/07/2023
Field of study

Large language models have demonstrated the ability to generate both natural language and programming language text. Although contemporary code generation models are trained on corpora with several programming languages, they are tested using benchmarks that are typically monolingual. The most widely used code generation benchmarks only target Python, so there is little quantitative evidence of how code generation models perform on other programming languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages. We use MultiPL-E to extend the HumanEval benchmark (Chen et al., 2021) and MBPP benchmark (Austin et al., 2021) to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex (Chen et al., 2021), CodeGen (Nijkamp et al., 2022) and InCoder (Fried et al., 2022). We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages

Digital Commons at Oberlin (Oberlin College)

StarCoder: may the source be with you!

Author: Abulkhanov Dmitry
Akiki Christopher
Allal Loubna Ben
Anderson Carolyn Jane
Bahdanau Dzmitry
Bhattacharyya Urvashi
Chim Jenny
Contractor Danish
Dao Tri
Davaadorj Mishig
de Vries Harm
Dehaene Olivier
Dey Manan
Ding Jennifer
Dolan-Gavitt Brendan
Ebert Jan
Fahmy Nour
Ferrandis Carlos Muñoz
Fried Daniel
Gontier Nicolas
Gu Alex
Guha Arjun
Hughes Sean
Jernite Yacine
Kocetkov Denis
Kunakov Maxim
Lamy-Poirier Joel
Lee Tony
Li Jia
Li Raymond
Lipkin Benjamin
Liu Qian
Luccioni Sasha
Marone Marc
Meade Nicholas
Mishra Mayank
Monteiro João
Mou Chenghao
Muennighoff Niklas
Murthy Rudra
Oblokulov Muhtasham
Patel Siva Sankalp
Reddy Siva
Robinson Jennifer
Romero Manuel
Schlesinger Claire
Schoelkopf Hailey
Shliazhko Oleh
Singh Swayam
Stillerman Jason
Timor Nadav
Umapathi Logesh Kumar
Villegas Paulo
von Werra Leandro
Wang Thomas
Wang Zhiruo
Wolf Thomas
Yee Ming-Ho
Yu Wenhao
Zebaze Armel
Zhang Zhihan
Zhdanov Fedor
Zheltonozhskii Evgenii
Zhu Jian
Zhuo Terry Yue
Zi Yangtian
Zocca Marco
Publication venue
Publication date: 09/05/2023
Field of study

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license

arXiv.org e-Print Archive