Search CORE

12 research outputs found

Towards the detection of cross-language source code reuse

Author: Barrón Cedeño Luis Alberto
Flores Sáez Enrique
Moreno Boronat Lidia Ana
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; Barrón Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., García, A., Rodríguez, S., Pedraza, J.L., Méndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The JK System to Detect Plagiarism

Author: Al-Jaafer Jubair J.
Sabri Khair Eddin M.
Publication venue
Publication date: 01/10/2006
Field of study

In this research a system, referred to as Jubair-Khaireddin (JK), has been developed to assess the degree of similarity between two programs even though they look superficially dissimilar. The JK system has the capability to detect deliberate attempts of plagiarism. Reverse engineering technique is used to bring each suspected program back to its initial specification stage. This operation enables us to extract the structure of the program which is an important factor in detecting plagiarism. This can be achieved through the extraction of the Static Execution Tree (SET) for each program. The SET is then transformed into Terminating Binary Sequence (TBS). The TBSs generated from the tested programs are compared in order to get similar branches. Reengineering technique is then applied on these similar branches in order to compute its entropy (information content). The entropy is computed to prove or disprove the existence of similarities between programs. The JK system has been tested on different Java programs with different modifications, and proved successful in detecting almost all cases including those of partially plagiarised programs.Facultad de Informátic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Uncovering source code reuse in large-scale academic environments

Author: Arwin
Baxter
Bejarano
Chuda
Clough
Cosma
Faidhi
Feng
Flores
Halstead
Harrison
Hislop
Jankowitz
Koschke
Kuo
Manning
McCabe
McNamee
Menai
Potthast
Prechelt
Robertson
Rosales
Spinellis
Whale
Wise
Witten
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

The advent of the Internet has caused an increase in content reuse, including source code. The purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good example is academia, where massive courses are taught to students who must demonstrate that they have acquired the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic systems such as the one described in this paper for source code reuse detection. Our approach is based on the comparison of programs at character level. It is able to find potential cases of reuse across a huge number of assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple sets of source codes. The most common obfuscation operations we found were changes in identifier names, comments and indentation. 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608Flores Sáez, E.; Barrón Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education. 23(3):383-390. doi:10.1002/cae.21608S38339023

Crossref

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Survey of Research on Software Clones

Author: Koschke Rainer
Publication venue: Dagstuhl Seminar Proceedings. 06301 - Duplication, Redundancy, and Similarity in Software
Publication date: 01/01/2007
Field of study

This report summarizes my overview talk on software clone detection research. It first discusses the notion of software redundancy, cloning, duplication, and similarity. Then, it describes various categorizations of clone types, empirical studies on the root causes for cloning, current opinions and wisdom of consequences of cloning, empirical studies on the evolution of clones, ways to remove, to avoid, and to detect them, empirical evaluations of existing automatic clone detector performance (such as recall, precision, time and space consumption) and their fitness for a particular purpose, benchmarks for clone detector evaluations, presentation issues, and last but not least application of clone detection in other related fields. After each summary of a subarea, I am listing open research questions

Dagstuhl Research Online Publication Server

The JK System to Detect Plagiarism

Author: Al-Jaafer Jubair J.
Sabri Khair Eddin M.
Publication venue
Publication date: 20/05/2008
Field of study

Servicio de Difusión de la Creación Intelectual

Prevention of cybercrimes in smart cities of India: from a citizen's perspective

Author: Arpan Kumar Kar
Hatice Kizgin
Sheshadri Chatterjee
Shreya Singhal v. Union of India WP (Criminal) No. 167 of 2012
Yogesh K. Dwivedi
Publication venue: 'Emerald'
Publication date: 17/12/2018
Field of study

YesPurpose: The purpose of this paper is to identify the factors influencing the citizens of India to prevent cybercrimes in the proposed Smart Cities of India. Design/methodology/approach: A conceptual model has been developed for identifying factors preventing cybercrimes. The conceptual model was validated empirically with a sample size of 315 participants from India. Data were analyzed using structural equation modeling with SPSS and AMOS softwares. Findings: The study reveals that the “awareness of cybercrimes” significantly influences the actual usage of technology to prevent cybercrimes in Smart Cities of India. The study reveals that government initiative (GI) and legal awareness are less influential in spreading of the awareness of cybercrimes (AOC) to the citizens of the proposed smart cities. Research limitations/implications: The conceptual model utilizes two constructs from the technology adoption model, namely, perceived usefulness and ease of use. The study employs other factors such as social media, word of mouth, GIs, legal awareness and organizations constituting entities spreading awareness from different related literature works. Thereby, a comprehensive theoretical conceptual model has been proposed which helps to identify the factors that may help in preventing cybercrimes. Practical implications: This study provides an insight to the policy maker to understand several factors influencing the AOC of the citizens of the proposed Smart Cities of India for the prevention of cybercrimes. Originality/value: There are few existing studies analyzing the effect of AOC to mitigate cybercrimes. Thus, this study offers a novel contribution

Crossref

Cronfa at Swansea University

Bradford Scholars

JPlag: finding plagiarisms among a set of programs

Author: Malpohl Guido
Philippsen Michael
Prechelt Lutz
Publication venue
Publication date: 02/08/2007
Field of study

JPlag is a system that finds pairs of similar programs among a given set of programs. It has successfully been used in practice to detect plagiarisms among student Java exercise submissions. Support for the languages C, C++ and Scheme is also available. This report presents the design of JPlag, in particular the comparison algorithm, and carefully evaluates JPlag\u27s performance on 12 rather different sets of Java programs. The results indicate that JPlag will find all plagiarisms with only very few exceptions. The execution time is less than one minute for submissions of 100 programs of several hundred lines each

KITopen

Reverse Engineering of Web Applications

Author: DE CARLINI UGO
Publication venue
Publication date: 01/01/2004
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Recommended from our members

Snitch : a flexible student program plagiarism detector

Author: Schricker Theodor Karl, Jr.
Publication venue: 'Oregon State University'
Publication date
Field of study

In most technical fields, a certain amount of practical application is necessary to master the important concepts and acquire basic problem solving skills. Computer science is no different, and this is reflected in the frequent programming assignments given in programming and data structures courses. The basic skills and concepts gained from these assignments cannot be internalized as easily through lectures and readings as they can from working through the problems step-by-step by oneself

ScholarsArchive@OSU

A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

Author: MacLean Angus
Publication venue
Publication date: 31/01/2003
Field of study

The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

Open Access Institutional Repository at Robert Gordon University