12 research outputs found

    Towards the detection of cross-language source code reuse

    Full text link
    Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; Barrón Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., García, A., Rodríguez, S., Pedraza, J.L., Méndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009

    The JK System to Detect Plagiarism

    Get PDF
    In this research a system, referred to as Jubair-Khaireddin (JK), has been developed to assess the degree of similarity between two programs even though they look superficially dissimilar. The JK system has the capability to detect deliberate attempts of plagiarism. Reverse engineering technique is used to bring each suspected program back to its initial specification stage. This operation enables us to extract the structure of the program which is an important factor in detecting plagiarism. This can be achieved through the extraction of the Static Execution Tree (SET) for each program. The SET is then transformed into Terminating Binary Sequence (TBS). The TBSs generated from the tested programs are compared in order to get similar branches. Reengineering technique is then applied on these similar branches in order to compute its entropy (information content). The entropy is computed to prove or disprove the existence of similarities between programs. The JK system has been tested on different Java programs with different modifications, and proved successful in detecting almost all cases including those of partially plagiarised programs.Facultad de Informátic

    Uncovering source code reuse in large-scale academic environments

    Full text link
    The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608Flores Sáez, E.; Barrón Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education. 23(3):383-390. doi:10.1002/cae.21608S38339023

    Survey of Research on Software Clones

    Get PDF
    This report summarizes my overview talk on software clone detection research. It first discusses the notion of software redundancy, cloning, duplication, and similarity. Then, it describes various categorizations of clone types, empirical studies on the root causes for cloning, current opinions and wisdom of consequences of cloning, empirical studies on the evolution of clones, ways to remove, to avoid, and to detect them, empirical evaluations of existing automatic clone detector performance (such as recall, precision, time and space consumption) and their fitness for a particular purpose, benchmarks for clone detector evaluations, presentation issues, and last but not least application of clone detection in other related fields. After each summary of a subarea, I am listing open research questions

    The JK System to Detect Plagiarism

    Get PDF
    In this research a system, referred to as Jubair-Khaireddin (JK), has been developed to assess the degree of similarity between two programs even though they look superficially dissimilar. The JK system has the capability to detect deliberate attempts of plagiarism. Reverse engineering technique is used to bring each suspected program back to its initial specification stage. This operation enables us to extract the structure of the program which is an important factor in detecting plagiarism. This can be achieved through the extraction of the Static Execution Tree (SET) for each program. The SET is then transformed into Terminating Binary Sequence (TBS). The TBSs generated from the tested programs are compared in order to get similar branches. Reengineering technique is then applied on these similar branches in order to compute its entropy (information content). The entropy is computed to prove or disprove the existence of similarities between programs. The JK system has been tested on different Java programs with different modifications, and proved successful in detecting almost all cases including those of partially plagiarised programs.Facultad de Informátic

    Prevention of cybercrimes in smart cities of India: from a citizen's perspective

    Get PDF
    YesPurpose: The purpose of this paper is to identify the factors influencing the citizens of India to prevent cybercrimes in the proposed Smart Cities of India. Design/methodology/approach: A conceptual model has been developed for identifying factors preventing cybercrimes. The conceptual model was validated empirically with a sample size of 315 participants from India. Data were analyzed using structural equation modeling with SPSS and AMOS softwares. Findings: The study reveals that the “awareness of cybercrimes” significantly influences the actual usage of technology to prevent cybercrimes in Smart Cities of India. The study reveals that government initiative (GI) and legal awareness are less influential in spreading of the awareness of cybercrimes (AOC) to the citizens of the proposed smart cities. Research limitations/implications: The conceptual model utilizes two constructs from the technology adoption model, namely, perceived usefulness and ease of use. The study employs other factors such as social media, word of mouth, GIs, legal awareness and organizations constituting entities spreading awareness from different related literature works. Thereby, a comprehensive theoretical conceptual model has been proposed which helps to identify the factors that may help in preventing cybercrimes. Practical implications: This study provides an insight to the policy maker to understand several factors influencing the AOC of the citizens of the proposed Smart Cities of India for the prevention of cybercrimes. Originality/value: There are few existing studies analyzing the effect of AOC to mitigate cybercrimes. Thus, this study offers a novel contribution

    JPlag: finding plagiarisms among a set of programs

    Get PDF
    JPlag is a system that finds pairs of similar programs among a given set of programs. It has successfully been used in practice to detect plagiarisms among student Java exercise submissions. Support for the languages C, C++ and Scheme is also available. This report presents the design of JPlag, in particular the comparison algorithm, and carefully evaluates JPlag\u27s performance on 12 rather different sets of Java programs. The results indicate that JPlag will find all plagiarisms with only very few exceptions. The execution time is less than one minute for submissions of 100 programs of several hundred lines each

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic
    corecore