Language Agnostic Code Embeddings

Chen, Pin Yu; Gu, Alex; Utpala, Saiteja

Language Agnostic Code Embeddings

Authors: Pin Yu Chen
Alex Gu
Saiteja Utpala
Publication date: 25 October 2023
Publisher

Abstract

Recently, code language models have achieved notable advancements in addressing a diverse array of essential code comprehension and generation tasks. Yet, the field lacks a comprehensive deep dive and understanding of the code embeddings of multilingual code models. In this paper, we present a comprehensive study on multilingual code embeddings, focusing on the cross-lingual capabilities of these embeddings across different programming languages. Through probing experiments, we demonstrate that code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details, primarily focusing on semantics. Further, we show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks, leading to an absolute increase of up to +17 in the Mean Reciprocal Rank (MRR)

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.16803

Last time updated on 16/01/2024