Detecting Similarity in Multi-procedure Student Programs Using only Static Code Structure

Abstract

Plagiarism is prevalent in most undergraduate programming courses, including those where more advanced programming is taught. Typical strategies used to avoid detection include changing variable names and adding empty spaces or comments to the code. Although these changes affect the visual components of the source code, the underlying structure of the code remains the same. This similarity in structure can indicate the presence of plagiarism. A system has been developed to detect the similarity in the structure of student programs. The detection system works in two phases: The first phase parses the source code and creates a syntax tree, representing the syntactical structure of each of the programs, while the second takes as inputs two program syntax trees and applies various comparison algorithms to detect their similarity. The outcome of the comparison allows the system to report a result from one of four similarity categories: identical structure, isomorphic structure, containing many structural similarities, and containing few structural similarities. Empirical tests on small sample programs show that the prototype implementation is effective in detecting plagiarism in source code, although in some cases manual checking is needed to confirm the presence of plagiarism

    Similar works