Skip to main content
Article thumbnail
Location of Repository

An approach to source-code plagiarism detection investigation using latent semantic analysis

By Georgina Cosma


This thesis looks at three aspects of source-code plagiarism. The first aspect of the\ud thesis is concerned with creating a definition of source-code plagiarism; the second aspect\ud is concerned with describing the findings gathered from investigating the Latent Semantic\ud Analysis information retrieval algorithm for source-code similarity detection; and the final\ud aspect of the thesis is concerned with the proposal and evaluation of a new algorithm that\ud combines Latent Semantic Analysis with plagiarism detection tools.\ud A recent review of the literature revealed that there is no commonly agreed definition of\ud what constitutes source-code plagiarism in the context of student assignments. This thesis\ud first analyses the findings from a survey carried out to gather an insight into the perspectives\ud of UK Higher Education academics who teach programming on computing courses. Based\ud on the survey findings, a detailed definition of source-code plagiarism is proposed.\ud Secondly, the thesis investigates the application of an information retrieval technique,\ud Latent Semantic Analysis, to derive semantic information from source-code files. Various\ud parameters drive the effectiveness of Latent Semantic Analysis. The performance of Latent\ud Semantic Analysis using various parameter settings and its effectiveness in retrieving\ud similar source-code files when optimising those parameters are evaluated.\ud Finally, an algorithm for combining Latent Semantic Analysis with plagiarism detection\ud tools is proposed and a tool is created and evaluated. The proposed tool, PlaGate, is\ud a hybrid model that allows for the integration of Latent Semantic Analysis with plagiarism\ud detection tools in order to enhance plagiarism detection. In addition, PlaGate has a facility\ud for investigating the importance of source-code fragments with regards to their contribution\ud towards proving plagiarism. PlaGate provides graphical output that indicates the clusters of\ud suspicious files and source-code fragments

Topics: QA76
OAI identifier:

Suggested articles

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.