8 research outputs found
InfeRE: Step-by-Step Regex Generation via Chain of Inference
Automatically generating regular expressions (abbrev. regexes) from natural
language description (NL2RE) has been an emerging research area. Prior studies
treat regex as a linear sequence of tokens and generate the final expressions
autoregressively in a single pass. They did not take into account the
step-by-step internal text-matching processes behind the final results. This
significantly hinders the efficacy and interpretability of regex generation by
neural language models. In this paper, we propose a new paradigm called InfeRE,
which decomposes the generation of regexes into chains of step-by-step
inference. To enhance the robustness, we introduce a self-consistency decoding
mechanism that ensembles multiple outputs sampled from different models. We
evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and
compare the results with state-of-the-art approaches and the popular tree-based
generation approach TRANX. Experimental results show that InfeRE substantially
outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5
accuracy on two datasets, respectively. Particularly, InfeRE outperforms the
popular tree-based generation approach by 18.1% and 11.3% on both datasets,
respectively, in terms of DFA@5 accuracy.Comment: This paper has been accepted by ASE'2
Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code
Pre-trained code representation models such as CodeBERT have demonstrated
superior performance in a variety of software engineering tasks, yet they are
often heavy in complexity, quadratically with the length of the input sequence.
Our empirical analysis of CodeBERT's attention reveals that CodeBERT pays more
attention to certain types of tokens and statements such as keywords and
data-relevant statements. Based on these findings, we propose DietCode, which
aims at lightweight leverage of large pre-trained models for source code.
DietCode simplifies the input program of CodeBERT with three strategies,
namely, word dropout, frequency filtering, and an attention-based strategy
which selects statements and tokens that receive the most attention weights
during pre-training. Hence, it gives a substantial reduction in the
computational cost without hampering the model performance. Experimental
results on two downstream tasks show that DietCodeBERT provides comparable
results to CodeBERT with 40% less computational cost in fine-tuning and
testing.Comment: Accepted to be published in ESEC/FSE 202
On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
In recent years, neural code translation has gained increasing attention.
While most of the research focuses on improving model architectures and
training processes, we notice that the evaluation process and benchmark for
code translation models are severely limited: they primarily treat source code
as natural languages and provide a holistic accuracy score while disregarding
the full spectrum of model capabilities across different translation types and
complexity. In this paper, we present a comprehensive investigation of four
state-of-the-art models and analyze in-depth the advantages and limitations of
three existing benchmarks. Based on the empirical results, we develop a
taxonomy that categorizes code translation tasks into four primary types
according to their complexity and knowledge dependence: token level (type 1),
syntactic level (type 2), library level (type 3), and algorithm level (type 4).
We then conduct a thorough analysis of how existing approaches perform across
these four categories. Our findings indicate that while state-of-the-art code
translation models excel in type-1 and type-2 translations, they struggle with
knowledge-dependent ones such as type-3 and type-4. Existing benchmarks are
biased towards trivial translations, such as keyword mapping. To overcome these
limitations, we construct G-TransEval, a new benchmark by manually curating
type-3 and type-4 translation pairs and unit test cases. Results on our new
benchmark suggest that G-TransEval can exhibit more comprehensive and
finer-grained capability of code translation models and thus provide a more
rigorous evaluation. Our studies also provide more insightful findings and
suggestions for future research, such as building type-3 and type-4 training
data and ensembling multiple pretraining approaches.Comment: accepted by ASE202
Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries
Ministry of Education, Singapore under its Academic Research Funding Tier