1,481 research outputs found

    Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

    Full text link
    To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS

    TF-IDF Inspired Detection for Cross-Language Source Code Plagiarism and Collusion

    Get PDF
    Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Further, it is comparable to those techniques when dealing with conventional disguises

    Collaboration Versus Cheating

    Full text link
    We outline how we detected programming plagiarism in an introductory online course for a master's of science in computer science program, how we achieved a statistically significant reduction in programming plagiarism by combining a clear explanation of university and class policy on academic honesty reinforced with a short but formal assessment, and how we evaluated plagiarism rates before SIGand after implementing our policy and assessment.Comment: 7 pages, 1 figure, 5 tables, SIGCSE 201

    On the Feasibility of Malware Authorship Attribution

    Full text link
    There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.Comment: FPS 201

    Uncovering source code reuse in large-scale academic environments

    Full text link
    The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608Flores Sáez, E.; Barrón Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education. 23(3):383-390. doi:10.1002/cae.21608S38339023

    Plagiarism detection for document

    Get PDF
    Our project aims to provide plagiarism based on semantic detection and natural language processing technique. Plagiarism detection for document is very effective technique, as nowadays students are mainly dependent on Internet. . The wide use and availability of electronic resources makes it easy for students, authors and even academic people to access and use any piece of information and embed it into his/ her own work without proper citation. Our project help authors, writers etc. to secure their files and make their files safe. It helps the user to upload the file easily and detect plagiarism more efficiently. It gives the more accurate results. This web application will help the users to upload the files and check for the plagiarism more easily and securely

    Acta Cybernetica : Volume 19. Number 1.

    Get PDF

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
    • …
    corecore