3 research outputs found
Automatic generation of benchmarks for plagiarism detection tools using grammatical evolution
This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in {Source Publication}, http://dx.doi.org/10.1145/10.1145/1276958.1277388An extended version of this poster is available at arXiv‘. See: http://arxiv.org/abs/cs/0703134v4Student plagiarism is a major problem in universities worldwide. In this
paper, we focus on plagiarism in answers to computer programming assignments,
where students mix and/or modify one or more original solutions
to obtain counterfeits. Although several software tools have been
developed to help the tedious and time consuming task of detecting plagiarism,
little has been done to assess their quality, because determining the
real authorship of the whole submission corpus is practically impossible
for graders. In this article we present a Grammatical Evolution technique
which generates benchmarks for testing plagiarism detection tools. Given
a programming language, our technique generates a set of original solutions
to an assignment, together with a set of plagiarisms of the former
set which mimic the basic plagiarism techniques performed by students.
The authorship of the submission corpus is predefined by the user, providing
a base for the assessment and further comparison of copy-catching
tools. We give empirical evidence of the suitability of our approach by
studying the behavior of one state-of-the-art detection tool (AC) on four
benchmarks coded in APL2, generated with our technique.Work supported by grant TSI2005-08255-C07-06 of the Spanish Ministry of Education and Science