Search CORE

2 research outputs found

CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

Author: Ahmad Wasi Uddin
Bhatia Parminder
Ding Hantian
Ding Yangruibo
Jain Nihal
Nallapati Ramesh
Ramanathan Murali Krishna
Roth Dan
Tan Ming
Wang Zijian
Xiang Bing
Publication venue
Publication date: 16/11/2023
Field of study

Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing and understanding cross-file context is often required to complete the code correctly. To fill in this gap, we propose CrossCodeEval, a diverse and multilingual code completion benchmark that necessitates an in-depth cross-file contextual understanding to complete the code accurately. CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages: Python, Java, TypeScript, and C#. To create examples that strictly require cross-file context for accurate completion, we propose a straightforward yet efficient static-analysis-based approach to pinpoint the use of cross-file context within the current file. Extensive experiments on state-of-the-art code language models like CodeGen and StarCoder demonstrate that CrossCodeEval is extremely challenging when the relevant cross-file context is absent, and we see clear improvements when adding these context into the prompt. However, despite such improvements, the pinnacle of performance remains notably unattained even with the highest-performing model, indicating that CrossCodeEval is also capable of assessing model's capability in leveraging extensive context to make better code completion. Finally, we benchmarked various methods in retrieving cross-file context, and show that CrossCodeEval can also be used to measure the capability of code retrievers.Comment: To appear at NeurIPS 2023 (Datasets and Benchmarks Track

arXiv.org e-Print Archive

ALEX: An Updatable Adaptive Learned Index

Author: Chandramouli Badrish
Ding Jialin
Do Jaeyoung
Gehrke Johannes
Kossmann Donald
Kraska Tim
Li Yinan
Lomet David
Minhas Umar Farooq
Wang Chi
Yu Jia
Zhang Hantian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2020
Field of study

© 2020 Association for Computing Machinery. Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+ tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+ trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates

arXiv.org e-Print Archive

DSpace@MIT

Crossref