Search CORE

19,431 research outputs found

A Machine Learning Approach for Plagiarism Detection

Author: Al-Sallal Muna
Publication venue
Publication date: 01/01/2016
Field of study

Plagiarism detection is gaining increasing importance due to requirements for integrity in education. The existing research has investigated the problem of plagrarim detection with a varying degree of success. The literature revealed that there are two main methods for detecting plagiarism, namely extrinsic and intrinsic. This thesis has developed two novel approaches to address both of these methods. Firstly a novel extrinsic method for detecting plagiarism is proposed. The method is based on four well-known techniques namely Bag of Words (BOW), Latent Semantic Analysis (LSA), Stylometry and Support Vector Machines (SVM). The LSA application was fine-tuned to take in the stylometric features (most common words) in order to characterise the document authorship as described in chapter 4. The results revealed that LSA based stylometry has outperformed the traditional LSA application. Support vector machine based algorithms were used to perform the classification procedure in order to predict which author has written a particular book being tested. The proposed method has successfully addressed the limitations of semantic characteristics and identified the document source by assigning the book being tested to the right author in most cases. Secondly, the intrinsic detection method has relied on the use of the statistical properties of the most common words. LSA was applied in this method to a group of most common words (MCWs) to extract their usage patterns based on the transitivity property of LSA. The feature sets of the intrinsic model were based on the frequency of the most common words, their relative frequencies in series, and the deviation of these frequencies across all books for a particular author. The Intrinsic method aims to generate a model of author “style” by revealing a set of certain features of authorship. The model’s generation procedure focuses on just one author as an attempt to summarise aspects of an author’s style in a definitive and clear-cut manner. The thesis has also proposed a novel experimental methodology for testing the performance of both extrinsic and intrinsic methods for plagiarism detection. This methodology relies upon the CEN (Corpus of English Novels) training dataset, but divides that dataset up into training and test datasets in a novel manner. Both approaches have been evaluated using the well-known leave-one-out-cross-validation method. Results indicated that by integrating deep analysis (LSA) and Stylometric analysis, hidden changes can be identified whether or not a reference collection exists

Coventry University Pure Portal

Plagiarism Detection: Keeping Check on Misuse of Intellectual Property

Author: Joshi Nisheeth
Mathur Iti
Publication venue
Publication date: 01/11/2011
Field of study

Today, Plagiarism has become a menace. Every journal editor or conference organizers has to deal with this problem. Simply Copying or rephrasing of text without giving due credit to the original author has become more common. This is considered to be an Intellectual Property Theft. We are developing a Plagiarism Detection Tool which would deal with this problem. In this paper we discuss the common tools available to detect plagiarism and their short comings and the advantages of our tool over these tools

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive

Ethical judgement and intent in business school students: the role of the psyche?

Author: Conway Elaine
Kotera Yasuhiro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

The aim of this paper is to highlight how business schools can improve the ethical behaviour of future managers. It assesses the positions of ethical judgement and ethical intent within a sample of UK business students, together with an analysis of underlying explanatory factors to those positions, such as levels of depression, anxiety, stress, motivation and self-compassion. A range of scales were used to evaluate the ethical stance and psychological characteristics of a group of UK business students. The results indicate that feelings of self-compassion, a sense of self-direction and mental health (in particular, depression) affect the ethical judgement and intent of students in a range of business and university scenarios. It is recommended that in addition to more formal ethics education, universities consider the mental health and psyche of their students to improve the efficacy of ethical training.N/

UDORA - University of Derby Online Research Archive

Forget About Cheating, What About Learning?

Author: Roberts Jane
Publication venue: 'University of Worcester'
Publication date: 01/05/2009
Field of study

This paper will argue that academics need to re-focus on what really matters when developing policies to prevent plagiarism (used here in a broad sense to include unauthorised collaboration in assessment) and deal with its occurrence. Too often, institutions adopt an approach based the concepts of dishonesty and theft. A focus on learning, I will argue, can be fairer to students, more effective in terms of plagiarism prevention, whilst resulting in a system with strengthened resilience to litigation

University of Worcester Research and Publications

Open Research Online (The Open University)

Towards the detection of cross-language source code reuse

Author: Barrón Cedeño Luis Alberto
Flores Sáez Enrique
Moreno Boronat Lidia Ana
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; Barrón Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., García, A., Rodríguez, S., Pedraza, J.L., Méndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science

Author: Bakshi Nikhil Angad
Goyal Pawan
Gupta Divyansh
Mukherjee Animesh
Niranjan Abhishek
Singh Mayank
Publication venue
Publication date: 06/05/2017
Field of study

Our current knowledge of scholarly plagiarism is largely based on the similarity between full text research articles. In this paper, we propose an innovative and novel conceptualization of scholarly plagiarism in the form of reuse of explicit citation sentences in scientific research articles. Note that while full-text plagiarism is an indicator of a gross-level behavior, copying of citation sentences is a more nuanced micro-scale phenomenon observed even for well-known researchers. The current work poses several interesting questions and attempts to answer them by empirically investigating a large bibliographic text dataset from computer science containing millions of lines of citation sentences. In particular, we report evidences of massive copying behavior. We also present several striking real examples throughout the paper to showcase widespread adoption of this undesirable practice. In contrast to the popular perception, we find that copying tendency increases as an author matures. The copying behavior is reported to exist in all fields of computer science; however, the theoretical fields indicate more copying than the applied fields

arXiv.org e-Print Archive

Crossref

Deep Investigation of Cross-Language Plagiarism Detection Methods

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 24/05/2017
Field of study

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes