Tackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detector

Lane, P.C.R.; Malcolm, J.

research

Tackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detector

Authors: P.C.R. Lane
J. Malcolm
Publication date: 1 January 2009
Publisher

Abstract

Ferret is a fast and eﬀective tool for detecting similarities in a group of ﬁles. Applying it to the PAN’09 corpus required modiﬁcations to meet the requirements of the competition, mainly to deal with the very large number of ﬁles, the large size of some of them, and to automate some of the decisions that would normally be made by a human operator. Ferret was able to detect numerous ﬁles in the development corpus that contain substantial similarities not marked as plagiarism, but it also identiﬁed quite a lot of pairs where random similarities masked actual plagiarism. An improved metric is therefore indicated if the “plagiarised” or “not plagiarised” decision is to be automated

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

University of Hertfordshire Research Archive

oai:uhra.herts.ac.uk:2299/3911

Last time updated on 12/04/2012