6,365 research outputs found

    The System Kato: Detecting Cases of Plagiarism for Answer-Set Programs

    Full text link
    Plagiarism detection is a growing need among educational institutions and solutions for different purposes exist. An important field in this direction is detecting cases of source-code plagiarism. In this paper, we present the tool Kato for supporting the detection of this kind of plagiarism in the area of answer-set programming (ASP). Currently, the tool is implemented for DLV programs but it is designed to handle other logic-programming dialects as well. We review the basic features of Kato, introduce its theoretical underpinnings, and discuss an application of Kato for plagiarism detection in the context of courses on logic programming at the Vienna University of Technology

    Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

    Full text link
    To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS

    The Effectiveness of Low-Level Structure-based Approach Toward Source Code Plagiarism Level Taxonomy

    Full text link
    Low-level approach is a novel way to detect source code plagiarism. Such approach is proven to be effective when compared to baseline approach (i.e., an approach which relies on source code token subsequence matching) in controlled environment. We evaluate the effectiveness of state of the art in low-level approach based on Faidhi \& Robinson's plagiarism level taxonomy; real plagiarism cases are employed as dataset in this work. Our evaluation shows that state of the art in low-level approach is effective to handle most plagiarism attacks. Further, it also outperforms its predecessor and baseline approach in most plagiarism levels.Comment: The 6th International Conference on Information and Communication Technolog

    Source-code plagiarism : an academic perspective

    Get PDF
    In computing courses, students are often required to complete tutorial and laboratory exercises asking them to produce source-code. Academics may require students to submit source-code produced as part of such exercises in order to monitor their students’ understanding of the material taught on that module, and submitted source-code may be checked for similarities in order to identify instances of plagiarism. In exercises that require students to work individually, source-code plagiarism can occur between students or students may plagiarise by copying material from a book or from other sources. We have conducted a survey of UK academics who teach programming on computing courses, in order to establish what is understood to constitute source-code plagiarism in an undergraduate context. In this report, we analyse the responses received from 59 academics. This report presents a detailed description of what can constitute source-code plagiarism from the perspective of academics who teach programming on computing courses, based on the responses to the survey

    Source-code plagiarism : a UK academic perspective

    Get PDF
    In computing courses, students are often required to complete tutorial and laboratory exercises asking them to produce source-code. Academics may require students to submit source-code produced as part of such exercises in order to monitor their students' understanding of the material taught on that module, and submitted source-code may be checked for similarities in order to identify instances of plagiarism. In exercises that require students to work individually, source-code plagiarism can occur between students or students may plagiarise by copying material from a book or from other sources. We have conducted a survey of UK academics who teach programming on computing courses, in order to establish what is understood to constitute source-code plagiarism in an undergraduate context. In this report, we analyse the responses received from 59 academics. This report presents a detailed description of what can constitute source-code plagiarism from the perspective of academics who teach programming on computing courses, based on the responses to the survey

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    HIDDEN MARKOV MODELS FOR SOFTWARE PIRACY DETECTION

    Get PDF
    The unauthorized copying of software is often referred to as software piracy. Soft- ware piracy causes billions of dollars of annual losses for companies and governments worldwide. In this project, we analyze a method for detecting software piracy. A meta- morphic generator is used to create morphed copies of a base piece of software. A hidden Markov Model is trained on the opcode sequences extracted from these mor- phed copies. The trained model is then used to score suspect software to determine its similarity to the base software. A high score indicates that the suspect software may be a modified version of the base software and, therefore, further investigation is warranted. In contrast, a low score indicates that the suspect software differs sig- nificantly from the base software. We show that our approach is robust, in the sense that the base software must be extensively modified before it is not detected
    • …
    corecore