Search CORE

6 research outputs found

Cross-language source code re-use detection using latent semantic analysis

Author: Barrón-Cedeño Luis Alberto
Flores Sáez Enrique
Moreno Boronat Lidia Ana
Rosso Paolo
Publication venue: 'Verlag der Technischen Universitat Graz'
Publication date: 01/01/2015
Field of study

[EN] Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional pproaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.This work was partially supported by Universitat Polit`ecnica de Val`encia, WIQ-EI (IRSES grant n. 269180), and DIANA-APPLICATIONS (TIN2012- 38603-C02- 01) project. The work of the fourth author is also supported by VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Flores Sáez, E.; Barrón-Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Cross-language source code re-use detection using latent semantic analysis. Journal of Universal Computer Science. 21(13):1708-1725. https://doi.org/10.3217/jucs-021-13-1708S17081725211

RiuNet

PAN@FIRE: Overview of CL-SOCO Track on the Detection of Cross-Language SOurce COde Re-use

Author
Publication venue
Publication date: 05/03/2020
Field of study

ABSTRACT The detection of source code re-use is an important research field for both software industry and academia fields. This paper summarizes the goals, organization and results of the second SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. PAN@FIRE shared task, named Cross-Language SOurce COde Re-use (CL-SOCO), focused on the detection of crosslingual re-used source codes in C and Java programming languages. Participant systems were asked to annotate several source codes as whether or not they represent cases of cross-lingual source code re-use. In total five teams participated and submitted 12 runs. The training and test collections were generated using an automatic translation tool establishing a standard evaluation framework for future research works in cross-language source code re-use detection. Although the results obtained by the participants look promising, the problem requires more efforts to be accurately solved

CiteSeerX

TF-IDF Inspired Detection for Cross-Language Source Code Plagiarism and Collusion

Author: Karnalim Oscar
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2020
Field of study

Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Further, it is comparable to those techniques when dealing with conventional disguises

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

CroLSSim: Cross‐language software similarity detector using hybrid approach of LSA‐based AST‐MDrep features and CNN‐LSTM model

Author: Alazab M.
Alazab M.
Cheng X.
Cheng X.
Naeem H.
Naeem H.
Naeem M.
Naeem M.
Ullah F.
Ullah F.
Publication venue: Wiley
Publication date: 01/01/2022
Field of study

Software similarity in different programming codes is a rapidly evolving field because of its numerous applications in software development, software cloning, software plagiarism, and software forensics. Currently, software researchers and developers search cross-language open-source repositories for similar applications for a variety of reasons, such as reusing programming code, analyzing different implementations, and looking for a better application. However, it is a challenging task because each programming language has a unique syntax and semantic structure. In this paper, a novel tool called Cross-Language Software Similarity (CroLSSim) is designed to detect similar software applications written in different programming codes. First, the Abstract Syntax Tree (AST) features are collected from different programming codes. These are high-quality features that can show the abstract view of each program. Then, Methods Description (MDrep) in combination with AST is used to examine the relationship among different method calls. Second, the Term Frequency Inverse Document Frequency approach is used to retrieve the local and global weights from AST-MDrep features. Third, the Latent Semantic Analysis-based features extraction and selection method is proposed to extract the semantic anchors in reduced dimensional space. Fourth, the Convolution Neural Network (CNN)-based features extraction method is proposed to mine the deep features. Finally, a hybrid deep learning model of CNN-Long-Short-Term Memory is designed to detect semantically similar software applications from these latent variables. The data set contains approximately 9.5K Java, 8.8K C#, and 7.4K C++ software applications obtained from GitHub. The proposed approach outperforms as compared with the state-of-the-art methods

Middlesex University Research Repository

Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

Author: Barrón-Cedeño Alberto
Flores Enrique
Moreno Lidia
Rosso Paolo
Publication venue: Journal of Universal Computer Science
Publication date: 01/01/2015
Field of study

Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance

ZENODO

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints