Search CORE

1,481 research outputs found

Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

Author: Karnalim Oscar
Sulistiani Lisan
Publication venue
Publication date: 28/10/2018
Field of study

To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS

arXiv.org e-Print Archive

Crossref

TF-IDF Inspired Detection for Cross-Language Source Code Plagiarism and Collusion

Author: Karnalim Oscar
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2020
Field of study

Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Further, it is comparable to those techniques when dealing with conventional disguises

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Collaboration Versus Cheating

Author: Aiken Alex
Caliskan-Islam Aylin
Eaton Sarah E
Grijalva Therese C
Tabsh Sami W
Zhang Youdan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2018
Field of study

We outline how we detected programming plagiarism in an introductory online course for a master's of science in computer science program, how we achieved a statistically significant reduction in programming plagiarism by combining a clear explanation of university and class policy on academic honesty reinforced with a short but formal assessment, and how we evaluated plagiarism rates before SIGand after implementing our policy and assessment.Comment: 7 pages, 1 figure, 5 tables, SIGCSE 201

arXiv.org e-Print Archive

Crossref

On the Feasibility of Malware Authorship Attribution

Author: A Rahimian
C Kruegel
DE Knuth
DI Holmes
EH Spafford
F Can
G Frantzeskou
I Krsul
J Ferrante
M Fowler
N Pržulj
N Rosenblum
S Alrabaee
S Alrabaee
S Alrabaee
S Burrows
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/01/2017
Field of study

There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.Comment: FPS 201

arXiv.org e-Print Archive

Crossref

Uncovering source code reuse in large-scale academic environments

Author: Arwin
Baxter
Bejarano
Chuda
Clough
Cosma
Faidhi
Feng
Flores
Halstead
Harrison
Hislop
Jankowitz
Koschke
Kuo
Manning
McCabe
McNamee
Menai
Potthast
Prechelt
Robertson
Rosales
Spinellis
Whale
Wise
Witten
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608Flores Sáez, E.; Barrón Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education. 23(3):383-390. doi:10.1002/cae.21608S38339023

Crossref

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Plagiarism detection for document

Author: Varun Shukla, Farhana Khan, Komal Mody, Prof. Sarita Rathod
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2016
Field of study

Our project aims to provide plagiarism based on semantic detection and natural language processing technique. Plagiarism detection for document is very effective technique, as nowadays students are mainly dependent on Internet. . The wide use and availability of electronic resources makes it easy for students, authors and even academic people to access and use any piece of information and embed it into his/ her own work without proper citation. Our project help authors, writers etc. to secure their files and make their files safe. It helps the user to upload the file easily and detect plagiarism more efficiently. It gives the more accurate results. This web application will help the users to upload the files and check for the plagiarism more easily and securely

International Journal on Recent and Innovation Trends in Computing and Communication

Acta Cybernetica : Volume 19. Number 1.

Author
Publication venue
Publication date: 01/01/2009
Field of study

University of Szeged

An Extended Stable Marriage Problem Algorithm for Clone Detection

Author: AlHakami Hosam
Chen Feng
Janicke Helge
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2014
Field of study

Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

arXiv.org e-Print Archive

CiteSeerX