6,365 research outputs found
The System Kato: Detecting Cases of Plagiarism for Answer-Set Programs
Plagiarism detection is a growing need among educational institutions and
solutions for different purposes exist. An important field in this direction is
detecting cases of source-code plagiarism. In this paper, we present the tool
Kato for supporting the detection of this kind of plagiarism in the area of
answer-set programming (ASP). Currently, the tool is implemented for DLV
programs but it is designed to handle other logic-programming dialects as well.
We review the basic features of Kato, introduce its theoretical underpinnings,
and discuss an application of Kato for plagiarism detection in the context of
courses on logic programming at the Vienna University of Technology
Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection
To solve time inefficiency issue, only potential pairs are compared in
string-matching-based source code plagiarism detection; wherein potentiality is
defined through a fast-yet-order-insensitive similarity measurement (adapted
from Information Retrieval) and only pairs which similarity degrees are higher
or equal to a particular threshold is selected. Defining such threshold is not
a trivial task considering the threshold should lead to high efficiency
improvement and low effectiveness reduction (if it is unavoidable). This paper
proposes two thresholding mechanisms---namely range-based and pair-count-based
mechanism---that dynamically tune the threshold based on the distribution of
resulted similarity degrees. According to our evaluation, both mechanisms are
more practical to be used than manual threshold assignment since they are more
proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and
Information Systems (ICACSIS
The Effectiveness of Low-Level Structure-based Approach Toward Source Code Plagiarism Level Taxonomy
Low-level approach is a novel way to detect source code plagiarism. Such
approach is proven to be effective when compared to baseline approach (i.e., an
approach which relies on source code token subsequence matching) in controlled
environment. We evaluate the effectiveness of state of the art in low-level
approach based on Faidhi \& Robinson's plagiarism level taxonomy; real
plagiarism cases are employed as dataset in this work. Our evaluation shows
that state of the art in low-level approach is effective to handle most
plagiarism attacks. Further, it also outperforms its predecessor and baseline
approach in most plagiarism levels.Comment: The 6th International Conference on Information and Communication
Technolog
Source-code plagiarism : an academic perspective
In computing courses, students are often required to complete tutorial and laboratory exercises asking them to produce source-code. Academics may require students to submit source-code produced as part of such exercises in order to monitor their students’ understanding of the material taught on that module, and submitted source-code may be checked for similarities in order to identify instances of plagiarism. In exercises that require students to work individually, source-code plagiarism can occur between students or students may plagiarise by copying material from a book or from other sources. We have conducted a survey of UK academics who teach programming on computing courses, in order to establish what is understood to constitute source-code plagiarism in an undergraduate context. In this report, we analyse the responses received from 59 academics. This report presents a detailed description of what can constitute source-code plagiarism from the perspective of academics who teach programming on computing courses, based on the responses to the survey
Source-code plagiarism : a UK academic perspective
In computing courses, students are often required to complete tutorial and laboratory exercises asking them to produce source-code. Academics may require students to submit source-code produced as part of such exercises in order to monitor their students' understanding of the material taught on that module, and submitted source-code may be checked for similarities in order to identify instances of plagiarism. In exercises that require students to work individually, source-code plagiarism can occur between students or students may plagiarise by copying material from a book or from other sources. We have conducted a survey of UK academics who teach programming on computing courses, in order to establish what is understood to constitute source-code plagiarism in an undergraduate context. In this report, we analyse the responses received from 59 academics. This report presents a detailed description of what can constitute source-code plagiarism from the perspective of academics who teach programming on computing courses, based on the responses to the survey
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
HIDDEN MARKOV MODELS FOR SOFTWARE PIRACY DETECTION
The unauthorized copying of software is often referred to as software piracy. Soft- ware piracy causes billions of dollars of annual losses for companies and governments worldwide. In this project, we analyze a method for detecting software piracy. A meta- morphic generator is used to create morphed copies of a base piece of software. A hidden Markov Model is trained on the opcode sequences extracted from these mor- phed copies. The trained model is then used to score suspect software to determine its similarity to the base software. A high score indicates that the suspect software may be a modified version of the base software and, therefore, further investigation is warranted. In contrast, a low score indicates that the suspect software differs sig- nificantly from the base software. We show that our approach is robust, in the sense that the base software must be extensively modified before it is not detected
- …