3 research outputs found
SLACC: Simion-based Language Agnostic Code Clones
Successful cross-language clone detection could enable researchers and
developers to create robust language migration tools, facilitate learning
additional programming languages once one is mastered, and promote reuse of
code snippets over a broader codebase. However, identifying cross-language
clones presents special challenges to the clone detection problem. A lack of
common underlying representation between arbitrary languages means detecting
clones requires one of the following solutions: 1) a static analysis framework
replicated across each targeted language with annotations matching language
features across all languages, or 2) a dynamic analysis framework that detects
clones based on runtime behavior.
In this work, we demonstrate the feasibility of the latter solution, a
dynamic analysis approach called SLACC for cross-language clone detection. Like
prior clone detection techniques, we use input/output behavior to match clones,
though we overcome limitations of prior work by amplifying the number of inputs
and covering more data types; and as a result, achieve better clusters than
prior attempts. Since clusters are generated based on input/output behavior,
SLACC supports cross-language clone detection. As an added challenge, we target
a static typed language, Java, and a dynamic typed language, Python. Compared
to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times
as many clusters and has higher precision (86.7% vs. 30.7%).
This is the first work to perform clone detection for dynamic typed languages
(precision = 87.3%) and the first to perform clone detection across languages
that lack a common underlying representation (precision = 94.1%). It provides a
first step towards the larger goal of scalable language migration tools.Comment: 11 Pages, 3 Figures, Accepted at ICSE 2020 technical trac