12 research outputs found
Towards the detection of cross-language source code reuse
Internet has made available huge amounts of information,
also source code. Source code repositories and, in general, programming
related websites, facilitate its reuse. In this work, we propose a simple
approach to the detection of cross-language source code reuse, a nearly
investigated problem. Our preliminary experiments, based on character
n-grams comparison, show that considering different sections of the
code (i.e., comments, code, reserved words, etc.), leads to different results.
When considering three programming languages: C++, Java, and
Python, the best result is obtained when comments are discarded and
the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; BarrĂłn Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., BarrĂłn-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., BarrĂłn-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., GarcĂa, A., RodrĂguez, S., Pedraza, J.L., MĂ©ndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009
The JK System to Detect Plagiarism
In this research a system, referred to as Jubair-Khaireddin (JK), has been developed to assess the degree of similarity between two programs even though they look superficially dissimilar. The JK system has the capability to detect deliberate attempts of plagiarism. Reverse engineering technique is used to bring each suspected program back to its initial specification stage. This operation enables us to extract the structure of the program which is an important factor in detecting plagiarism. This can be achieved through the extraction of the Static Execution Tree (SET) for each program. The SET is then transformed into Terminating Binary Sequence (TBS). The TBSs generated from the tested programs are compared in order to get similar branches. Reengineering technique is then applied on these similar branches in order to compute its entropy (information content). The entropy is computed to prove or disprove the existence of similarities between programs. The JK system has been tested on different Java programs with different modifications, and proved successful in detecting almost all cases including those of partially plagiarised programs.Facultad de Informátic
Uncovering source code reuse in large-scale academic environments
The advent of the Internet has caused an increase in content reuse, including source code. The
purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good
example is academia, where massive courses are taught to students who must demonstrate that they have acquired
the knowledge. The need of detecting content reuse in quasi real-time encourages the development of automatic
systems such as the one described in this paper for source code reuse detection. Our approach is based on the
comparison of programs at character level. It is able to find potential cases of reuse across a huge number of
assignments. It achieved better results than JPlag, the most used online system to find similarities among multiple
sets of source codes. The most common obfuscation operations we found were changes in identifier names,
comments and indentation. 2014 Wiley Periodicals, Inc. Comput Appl Eng Educ 23:383–390, 2015; View this article
online at wileyonlinelibrary.com/journal/cae; DOI 10.1002/cae.21608Flores Sáez, E.; Barrón Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education. 23(3):383-390. doi:10.1002/cae.21608S38339023
Survey of Research on Software Clones
This report summarizes my overview talk on software clone detection
research. It first discusses the notion of software redundancy, cloning, duplication,
and similarity. Then, it describes various categorizations of clone types, empirical
studies on the root causes for cloning, current opinions and wisdom of consequences
of cloning, empirical studies on the evolution of clones, ways to remove, to avoid,
and to detect them, empirical evaluations of existing automatic clone detector performance
(such as recall, precision, time and space consumption) and their fitness
for a particular purpose, benchmarks for clone detector evaluations, presentation
issues, and last but not least application of clone detection in other related fields.
After each summary of a subarea, I am listing open research questions
The JK System to Detect Plagiarism
In this research a system, referred to as Jubair-Khaireddin (JK), has been developed to assess the degree of similarity between two programs even though they look superficially dissimilar. The JK system has the capability to detect deliberate attempts of plagiarism. Reverse engineering technique is used to bring each suspected program back to its initial specification stage. This operation enables us to extract the structure of the program which is an important factor in detecting plagiarism. This can be achieved through the extraction of the Static Execution Tree (SET) for each program. The SET is then transformed into Terminating Binary Sequence (TBS). The TBSs generated from the tested programs are compared in order to get similar branches. Reengineering technique is then applied on these similar branches in order to compute its entropy (information content). The entropy is computed to prove or disprove the existence of similarities between programs. The JK system has been tested on different Java programs with different modifications, and proved successful in detecting almost all cases including those of partially plagiarised programs.Facultad de Informátic
Prevention of cybercrimes in smart cities of India: from a citizen's perspective
YesPurpose: The purpose of this paper is to identify the factors influencing the citizens of India to prevent cybercrimes in the proposed Smart Cities of India.
Design/methodology/approach: A conceptual model has been developed for identifying factors preventing cybercrimes. The conceptual model was validated empirically with a sample size of 315 participants from India. Data were analyzed using structural equation modeling with SPSS and AMOS softwares.
Findings: The study reveals that the “awareness of cybercrimes” significantly influences the actual usage of technology to prevent cybercrimes in Smart Cities of India. The study reveals that government initiative (GI) and legal awareness are less influential in spreading of the awareness of cybercrimes (AOC) to the citizens of the proposed smart cities.
Research limitations/implications: The conceptual model utilizes two constructs from the technology adoption model, namely, perceived usefulness and ease of use. The study employs other factors such as social media, word of mouth, GIs, legal awareness and organizations constituting entities spreading awareness from different related literature works. Thereby, a comprehensive theoretical conceptual model has been proposed which helps to identify the factors that may help in preventing cybercrimes.
Practical implications: This study provides an insight to the policy maker to understand several factors influencing the AOC of the citizens of the proposed Smart Cities of India for the prevention of cybercrimes.
Originality/value: There are few existing studies analyzing the effect of AOC to mitigate cybercrimes. Thus, this study offers a novel contribution
JPlag: finding plagiarisms among a set of programs
JPlag is a system that finds pairs of similar programs among a
given
set of programs. It has successfully been used in practice to
detect
plagiarisms among student Java exercise submissions.
Support for the languages C, C++ and Scheme is also available.
This report presents the design of JPlag, in particular the
comparison
algorithm, and carefully evaluates JPlag\u27s performance on 12
rather
different sets of Java programs.
The results indicate that JPlag will find all plagiarisms with
only
very few exceptions. The execution time is less than one minute
for submissions of 100 programs of several hundred lines
each
Recommended from our members
Snitch : a flexible student program plagiarism detector
In most technical fields, a certain amount of practical application is necessary to master the important concepts and acquire basic problem solving skills. Computer science is no different, and this is reflected in the frequent programming assignments given in programming and data structures courses. The basic skills and concepts gained from these assignments cannot be internalized as easily through lectures and readings as they can from working through the problems step-by-step by oneself
A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.
The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic