1,481 research outputs found
Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection
To solve time inefficiency issue, only potential pairs are compared in
string-matching-based source code plagiarism detection; wherein potentiality is
defined through a fast-yet-order-insensitive similarity measurement (adapted
from Information Retrieval) and only pairs which similarity degrees are higher
or equal to a particular threshold is selected. Defining such threshold is not
a trivial task considering the threshold should lead to high efficiency
improvement and low effectiveness reduction (if it is unavoidable). This paper
proposes two thresholding mechanisms---namely range-based and pair-count-based
mechanism---that dynamically tune the threshold based on the distribution of
resulted similarity degrees. According to our evaluation, both mechanisms are
more practical to be used than manual threshold assignment since they are more
proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and
Information Systems (ICACSIS
TF-IDF Inspired Detection for Cross-Language Source Code Plagiarism and Collusion
Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Further, it is comparable to those techniques when dealing with conventional disguises
Collaboration Versus Cheating
We outline how we detected programming plagiarism in an introductory online
course for a master's of science in computer science program, how we achieved a
statistically significant reduction in programming plagiarism by combining a
clear explanation of university and class policy on academic honesty reinforced
with a short but formal assessment, and how we evaluated plagiarism rates
before SIGand after implementing our policy and assessment.Comment: 7 pages, 1 figure, 5 tables, SIGCSE 201
On the Feasibility of Malware Authorship Attribution
There are many occasions in which the security community is interested to
discover the authorship of malware binaries, either for digital forensics
analysis of malware corpora or for thwarting live threats of malware invasion.
Such a discovery of authorship might be possible due to stylistic features
inherent to software codes written by human programmers. Existing studies of
authorship attribution of general purpose software mainly focus on source code,
which is typically based on the style of programs and environment. However,
those features critically depend on the availability of the program source
code, which is usually not the case when dealing with malware binaries. Such
program binaries often do not retain many semantic or stylistic features due to
the compilation process. Therefore, authorship attribution in the domain of
malware binaries based on features and styles that will survive the compilation
process is challenging. This paper provides the state of the art in this
literature. Further, we analyze the features involved in those techniques. By
using a case study, we identify features that can survive the compilation
process. Finally, we analyze existing works on binary authorship attribution
and study their applicability to real malware binaries.Comment: FPS 201
Uncovering source code reuse in large-scale academic environments
The advent of the Internet has caused an increase in content reuse, including source code. The
purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good
example is academia, where massive courses are taught to students who must demonstrate that they have acquired
the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic
systems such as the one described in this paper for source code reuse detection. Our approach is based on the
comparison of programs at character level. It is able to find potential cases of reuse across a huge number of
assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple
sets of source codes. The most common obfuscation operations we found were changes in identifier names,
comments and indentation. 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article
online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608Flores Sáez, E.; Barrón Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education. 23(3):383-390. doi:10.1002/cae.21608S38339023
Plagiarism detection for document
Our project aims to provide plagiarism based on semantic detection and natural language processing technique. Plagiarism detection for document is very effective technique, as nowadays students are mainly dependent on Internet. . The wide use and availability of electronic resources makes it easy for students, authors and even academic people to access and use any piece of information and embed it into his/ her own work without proper citation. Our project help authors, writers etc. to secure their files and make their files safe. It helps the user to upload the file easily and detect plagiarism more efficiently. It gives the more accurate results. This web application will help the users to upload the files and check for the plagiarism more easily and securely
An Extended Stable Marriage Problem Algorithm for Clone Detection
Code cloning negatively affects industrial software and threatens
intellectual property. This paper presents a novel approach to detecting cloned
software by using a bijective matching technique. The proposed approach focuses
on increasing the range of similarity measures and thus enhancing the precision
of the detection. This is achieved by extending a well-known stable-marriage
problem (SMP) and demonstrating how matches between code fragments of different
files can be expressed. A prototype of the proposed approach is provided using
a proper scenario, which shows a noticeable improvement in several features of
clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
- …