Search CORE

7 research outputs found

Software similarity and classification

Author: Cesare Silvio
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/06/2013
Field of study

This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

Deakin Research Online

A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

Author: Ekhtiarzadeh Masoud
Parsa Saeed
Ramezani Mohammad
Roy Chanchal
Zakeri-Nasrabadi Morteza
Publication venue
Publication date: 28/06/2023
Field of study

Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

arXiv.org e-Print Archive

Rebooting Research on Detecting Repackaged Android Apps: Literature Review and Benchmark

Author: Bissyande Tegawendé François D Assise
Klein Jacques
Li Li
Publication venue
Publication date: 26/02/2019
Field of study

Open Repository and Bibliography - Luxembourg

Understanding Android App Piggybacking:A Systematic Study of Malicious Code Grafting

Author: Bissyande Tegawendé François D Assise
Cavallaro Lorenzo
Klein Jacques
Le Traon Yves
Li Daoyuan
Li Li
Lo David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code

Royal Holloway - Pure

Institutional Knowledge at Singapore Management University

King's Research Portal

Open Repository and Bibliography - Luxembourg

Clonewise - detecting package-level clones using machine learning

Author: Cesare Silvio
Xiang Yang
Zhang Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Developers sometimes maintain an internal copy of another software or fork development of an existing project. This practice can lead to software vulnerabilities when the embedded code is not kept up to date with upstream sources. We propose an automated solution to identify clones of packages without any prior knowledge of these relationships. We then correlate clones with vulnerability information to identify outstanding security problems. This approach motivates software maintainers to avoid using cloned packages and link against system wide libraries. We propose over 30 novel features that enable us to use to use pattern classification to accurately identify package-level clones. To our knowledge, we are the first to consider clone detection as a classification problem. Our results show our system, Clonewise, compares well to manually tracked databases. Based on our work, over 30 unknown package clones and vulnerabilities have been identified and patched

Deakin Research Online