2 research outputs found
Detecting Disguised Plagiarism
Source code plagiarism detection is a problem that has been addressed several
times before; and several tools have been developed for that purpose. In this
research project we investigated a set of possible disguises that can be
mechanically applied to plagiarized source code to defeat plagiarism detection
tools. We propose a preprocessor to be used with existing plagiarism detection
tools to "normalize" source code before checking it, thus making such disguises
ineffective
Evolving Similarity Functions for Code Plagiarism Detection
Detecting whether computer program code is a student’s original work or has been copied from another student or some other source is a major problem for many universities. Detection methods based on the information retrieval concepts of indexing and similarity matching scale well to large collections of files, but require appropriate similarity functions for good performance. We have used particle swarm optimization and genetic programming to evolve similarity functions that are suited to computer program code. Using a training set of plagiarised and non-plagiarised programs we have evolved better parameter values for the previously published Okapi BM25 similarity function. We have then used genetic programming to evolve completely new similarity functions that do not conform to any predetermined structure. We found that the evolved similarity functions outperformed the human developed Okapi BM25 function. We also found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files. The evolutionary computing techniques have been extremely useful in finding similarity functions that advance the state of the art in code plagiarism detection