19,431 research outputs found
A Machine Learning Approach for Plagiarism Detection
Plagiarism detection is gaining increasing importance due to requirements for integrity in education. The existing research has investigated the problem of plagrarim detection with a varying degree of success. The literature revealed that there are two main methods for detecting plagiarism, namely extrinsic and intrinsic.
This thesis has developed two novel approaches to address both of these methods. Firstly a novel extrinsic method for detecting plagiarism is proposed. The method is based on four well-known techniques namely Bag of Words (BOW), Latent Semantic Analysis (LSA), Stylometry and Support Vector Machines (SVM). The LSA application was fine-tuned to take in the stylometric features (most common words) in order to characterise the document authorship as described in chapter 4. The results revealed that LSA based stylometry has outperformed the traditional LSA application. Support vector machine based algorithms were used to perform the classification procedure in order to predict which author has written a particular book being tested. The proposed method has successfully addressed the limitations of semantic characteristics and identified the document source by assigning the book being tested to the right author in most cases.
Secondly, the intrinsic detection method has relied on the use of the statistical properties of the most common words. LSA was applied in this method to a group of most common words (MCWs) to extract their usage patterns based on the transitivity property of LSA. The feature sets of the intrinsic model were based on the frequency of the most common words, their relative frequencies in series, and the deviation of these frequencies across all books for a particular author.
The Intrinsic method aims to generate a model of author “style” by revealing a set of certain features of authorship. The model’s generation procedure focuses on just one author as an attempt to summarise aspects of an author’s style in a definitive and clear-cut manner.
The thesis has also proposed a novel experimental methodology for testing the performance of both extrinsic and intrinsic methods for plagiarism detection. This methodology relies upon the CEN (Corpus of English Novels) training dataset, but divides that dataset up into training and test datasets in a novel manner. Both approaches have been evaluated using the well-known leave-one-out-cross-validation method. Results indicated that by integrating deep analysis (LSA) and Stylometric analysis, hidden changes can be identified whether or not a reference collection exists
Plagiarism Detection: Keeping Check on Misuse of Intellectual Property
Today, Plagiarism has become a menace. Every journal editor or conference organizers has to deal with this problem. Simply Copying or rephrasing of text without giving due credit to the original author has become more common. This is considered to be an Intellectual Property Theft. We are developing a Plagiarism Detection Tool which would deal with this problem. In this paper we discuss the common tools available to detect plagiarism and their short comings and the advantages of our tool over these tools
Ethical judgement and intent in business school students: the role of the psyche?
The aim of this paper is to highlight how business schools can improve the ethical behaviour of future managers. It assesses the positions of ethical judgement and ethical intent within a sample of UK business students, together with an analysis of underlying explanatory factors to those positions, such as levels of depression, anxiety, stress, motivation and self-compassion. A range of scales were used to evaluate the ethical stance and psychological characteristics of a group of UK business students. The results indicate that feelings of self-compassion, a sense of self-direction and mental health (in particular, depression) affect the ethical judgement and intent of students in a range of business and university scenarios. It is recommended that in addition to more formal ethics education, universities consider the mental health and psyche of their students to improve the efficacy of ethical training.N/
Forget About Cheating, What About Learning?
This paper will argue that academics need to re-focus on what really matters when developing policies to prevent plagiarism (used here in a broad sense to include
unauthorised collaboration in assessment) and deal with its occurrence. Too often, institutions adopt an approach based the concepts of dishonesty and theft. A focus on learning, I will argue, can be fairer to students, more effective in terms of plagiarism prevention, whilst resulting in a system with strengthened resilience to litigation
Towards the detection of cross-language source code reuse
Internet has made available huge amounts of information,
also source code. Source code repositories and, in general, programming
related websites, facilitate its reuse. In this work, we propose a simple
approach to the detection of cross-language source code reuse, a nearly
investigated problem. Our preliminary experiments, based on character
n-grams comparison, show that considering different sections of the
code (i.e., comments, code, reserved words, etc.), leads to different results.
When considering three programming languages: C++, Java, and
Python, the best result is obtained when comments are discarded and
the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; BarrĂłn Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., BarrĂłn-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., BarrĂłn-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., GarcĂa, A., RodrĂguez, S., Pedraza, J.L., MĂ©ndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009
Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science
Our current knowledge of scholarly plagiarism is largely based on the
similarity between full text research articles. In this paper, we propose an
innovative and novel conceptualization of scholarly plagiarism in the form of
reuse of explicit citation sentences in scientific research articles. Note that
while full-text plagiarism is an indicator of a gross-level behavior, copying
of citation sentences is a more nuanced micro-scale phenomenon observed even
for well-known researchers. The current work poses several interesting
questions and attempts to answer them by empirically investigating a large
bibliographic text dataset from computer science containing millions of lines
of citation sentences. In particular, we report evidences of massive copying
behavior. We also present several striking real examples throughout the paper
to showcase widespread adoption of this undesirable practice. In contrast to
the popular perception, we find that copying tendency increases as an author
matures. The copying behavior is reported to exist in all fields of computer
science; however, the theoretical fields indicate more copying than the applied
fields
Deep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection
methods on a new recently introduced open dataset, which contains parallel and
comparable collections of documents with multiple characteristics (different
genres, languages and sizes of texts). We investigate cross-language plagiarism
detection methods for 6 language pairs on 2 granularities of text units in
order to draw robust conclusions on the best methods while deeply analyzing
correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable
Corpora) colocated with ACL 201
- …