Identifying Functionally Similar Code in Complex Codebases


Identifying similar code in software systems can assist many software engineering tasks, including program understanding. While most approaches focus on identifying code that looks alike, some researchers propose to detect instead code that functions alike, which are known as functional clones. However, previous work has raised the technical challenges to detect these functional clones in object oriented languages such as Java. We propose a novel technique, In-Vivo Clone Detection, a language-agnostic technique that detects functional clones in arbitrary programs by observing and mining inputs and outputs. We implemented this technique targeting programs that run on the JVM, creating HitoshiIO (available freely on GitHub), a tool to detect functional code clones. Our experimental results show that it is powerful in detecting these functional clones, finding 185 methods that are functionally similar across a corpus of 118 projects, even when there are only very few inputs available. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate, while the false positive rate is only 15%

Similar works

Full text


Columbia University Academic Commons

Provided a free PDF

This paper was published in Columbia University Academic Commons.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.