Plagiarism Detection in arXiv

Gehrke, Johannes; Ginsparg, Paul; Sorokina, Daria; Warner, Simeon

research

Plagiarism Detection in arXiv

Authors: Johannes Gehrke
Paul Ginsparg
Daria Sorokina
Simeon Warner
Publication date: 1 January 2006
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.Comment: Sixth International Conference on Data Mining (ICDM'06), Dec 200

Similar works

Full text

Available Versions

eCommons (Cornell Univ.)

oai:ecommons.cornell.edu:1813/...

Last time updated on 08/03/2017

Crossref

Last time updated on 01/04/2019