1,428 research outputs found
Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts
Most of the JavaScript code deployed in the wild has been minified, a process
in which identifier names are replaced with short, arbitrary and meaningless
names. Minified code occupies less space, but also makes the code extremely
difficult to manually inspect and understand. This paper presents Context2Name,
a deep learningbased technique that partially reverses the effect of
minification by predicting natural identifier names for minified names. The
core idea is to predict from the usage context of a variable a name that
captures the meaning of the variable. The approach combines a lightweight,
token-based static analysis with an auto-encoder neural network that summarizes
usage contexts and a recurrent neural network that predict natural names for a
given usage context. We evaluate Context2Name with a large corpus of real-world
JavaScript code and show that it successfully predicts 47.5% of all minified
identifiers while taking only 2.9 milliseconds on average to predict a name. A
comparison with the state-of-the-art tools JSNice and JSNaughty shows that our
approach performs comparably in terms of accuracy while improving in terms of
efficiency. Moreover, Context2Name complements the state-of-the-art by
predicting 5.3% additional identifiers that are missed by both existing tools
UNGOML: Automated Classification of unsafe Usages in Go
The Go programming language offers strong protection from memory corruption.
As an escape hatch of these protections, it provides the unsafe package.
Previous studies identified that this unsafe package is frequently used in
real-world code for several purposes, e.g., serialization or casting types. Due
to the variety of these reasons, it may be possible to refactor specific usages
to avoid potential vulnerabilities. However, the classification of unsafe
usages is challenging and requires the context of the call and the program's
structure. In this paper, we present the first automated classifier for unsafe
usages in Go, UNGOML, to identify what is done with the unsafe package and why
it is used. For UNGOML, we built four custom deep learning classifiers trained
on a manually labeled data set. We represent Go code as enriched control-flow
graphs (CFGs) and solve the label prediction task with one single-vertex and
three context-aware classifiers. All three context-aware classifiers achieve a
top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore,
in a set-valued conformal prediction setting, we achieve accuracies of more
than 93% with mean label set sizes of 2 for both dimensions. Thus, UNGOML can
be used to efficiently filter unsafe usages for use cases such as refactoring
or a security audit. UNGOML: https://github.com/stg-tud/ungoml Artifact:
https://dx.doi.org/10.6084/m9.figshare.22293052Comment: 13 pages, accepted at the 2023 IEEE/ACM 20th International Conference
on Mining Software Repositories (MSR 2023
- …