Search CORE

1 research outputs found

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference

Author: Alberto Barrón-cedeño
Paolo Rosso
Publication venue
Publication date: 01/01/2008
Field of study

Abstract. To plagiarise is to robe credit of another person’s work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, could be a relevant feature in plagiarism detection.

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna