Search CORE

60 research outputs found

PROGRAM INSPECTION AND TESTING TECHNIQUES FOR CODE CLONES AND REFACTORINGS IN EVOLVING SOFTWARE

Author: Chen Zhiyuan
Publication venue: DigitalCommons@UNO
Publication date: 01/10/2017
Field of study

Developers often perform copy-and-paste activities. This practice causes the similar code fragment (aka code clones) to be scattered throughout a code base. Refactoring for clone removal is beneficial, preventing clones from having negative effects on software quality, such as hidden bug propagation and unintentional inconsistent changes. However, recent research has provided evidence that factoring out clones does not always reduce the risk of introducing defects, and it is often difficult or impossible to remove clones using standard refactoring techniques. To investigate which or how clones can be refactored, developers typically spend a significant amount of their time managing individual clone instances or clone groups scattered across a large code base. To address the problem, this research proposes two techniques to inspect and validate refactoring changes. First, we propose a technique for managing clone refactorings, Pattern-based clone Refactoring Inspection (PRI), using refactoring pattern templates. By matching the refactoring pattern templates against a code base, it summarizes refactoring changes of clones, and detects the clone instances not consistently factored out as potential anomalies. Second, we propose Refactoring Investigation and Testing technique, called RIT. RIT improves the testing efficiency for validating refactoring changes. RIT uses PRI to identify refactorings by analyzing original and edited versions of a program. It then uses the semantic impact of a set of identified refactoring changes to detect tests whose behavior may have been affected and modified by refactoring edits. Given each failed asserts, RIT helps developers focus their attention on logically related program statements by applying program slicing for minimizing each test. For debugging purposes, RIT determines specific failure-inducing refactoring edits, separating from other changes that only affect other asserts or tests

The University of Nebraska, Omaha

apk2vec: Semi-supervised multi-view representation learning for profiling Android applications

Author: Chen Lihui
Liu Yang
Narayanan Annamalai
Soh Charlie
Wang Lipo
Publication venue
Publication date: 01/01/2018
Field of study

Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semi-supervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.Comment: International Conference on Data Mining, 201

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)

Slicing unconditional jumps with unnecessary control dependencies

Author: Galindo-Jiménez Carlos Santiago
Pérez-Rubio Sergio
Silva Josep
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2021
Field of study

[EN] Program slicing is an analysis technique that has a wide range of applications, ranging from compilers to clone detection software, and that has been applied to practically all programming languages. Most program slicing techniques are based on a widely extended program representation, the System Dependence Graph (SDG). However, in the presence of unconditional jumps, there exist some situations where most SDG-based slicing techniques are not as accurate as possible, including more code than strictly necessary. In this paper, we identify one of these scenarios, pointing out the cause of the inaccuracy, and describing the initial solution to the problem proposed in the literature, together with an extension, which solves the problem completely. These solutions modify both the SDG generation and the slicing algorithm. Additionally, we propose an alternative solution, that solves the problem by modifying only the SDG generation, leaving the slicing algorithm untouched.This work has been partially supported by the EU (FEDER) and the Spanish MCI/AEI under grants TIN2016-76843-C4-1-R and PID2019-104735RB-C41, by the Generalitat Valenciana under grant Prometeo/2019/098 (DeepTrust), and by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215Galindo-Jiménez, CS.; Pérez-Rubio, S.; Silva, J. (2021). Slicing unconditional jumps with unnecessary control dependencies. Lecture Notes in Computer Science. 12561:293-308. https://doi.org/10.1007/978-3-030-68446-4_15S2933081256

RiuNet

コード　クローン　二　フクマレル　メソッド　ヨビダシ　ノ　ヘンコウ　ドアイ　ノ　チョウサ

Author: Date Hironori
Inoue Katsuro
Ishio Takashi
Kudo Ryosuke
イシオタカシ
イノウエカツロウ
クドウリョウスケ
ダテヒロノリ
井上克郎
伊達浩典
工藤良介
石尾隆
Publication venue: 情報処理学会
Publication date: 04/03/2013
Field of study

Osaka University Knowledge Archive

Scalable detection of semantic clones

Author: GABEL Mark
JIANG Lingxiao
SU Zhendong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

Information Flow Control with System Dependence Graphs - Improving Modularity, Scalability and Precision for Object Oriented Languages

Author: Graf Jürgen
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

Die vorliegende Arbeit befasst sich mit dem Gebiet der statischen Programmanalyse — insbesondere betrachten wir Analysen, deren Ziel es ist, bestimmte Sicherheitseigenschaften, wie etwa Integrität und Vertraulichkeit, für Programme zu garantieren. Hierfür verwenden wir sogenannte Abhängigkeitsgraphen, welche das potentielle Verhalten des Programms sowie den Informationsfluss zwischen einzelnen Programmpunkten abbilden. Mit Hilfe dieser Technik können wir sicherstellen, dass z.B. ein Programm keinerlei Information über ein geheimes Passwort preisgibt. Im Speziellen liegt der Fokus dieser Arbeit auf Techniken, die das Erstellen des Abhängigkeitsgraphen verbessern, da dieser die Grundlage für viele weiterführende Sicherheitsanalysen bildet. Die vorgestellten Algorithmen und Verbesserungen wurden in unser Analysetool Joana integriert und als Open-Source öffentlich verfügbar gemacht. Zahlreiche Kooperationen und Veröffentlichungen belegen, dass die Verbesserungen an Joana auch in der Forschungspraxis relevant sind. Diese Arbeit besteht im Wesentlichen aus drei Teilen. Teil 1 befasst sich mit Verbesserungen bei der Berechnung des Abhängigkeitsgraphen, Teil 2 stellt einen neuen Ansatz zur Analyse von unvollständigen Programmen vor und Teil 3 zeigt aktuelle Verwendungsmöglichkeiten von Joana an konkreten Beispielen. Im ersten Teil gehen wir detailliert auf die Algorithmen zum Erstellen eines Abhängigkeitsgraphen ein, dabei legen wir besonderes Augenmerk auf die Probleme und Herausforderung bei der Analyse von Objektorientierten Sprachen wie Java. So stellen wir z.B. eine Analyse vor, die den durch Exceptions ausgelösten Kontrollfluss präzise behandeln kann. Hauptsächlich befassen wir uns mit der Modellierung von Seiteneffekten, die bei der Kommunikation über Methodengrenzen hinweg entstehen können. Bei Abhängigkeitsgraphen werden Seiteneffekte, also Speicherstellen, die von einer Methode gelesen oder verändert werden, in Form von zusätzlichen Knoten dargestellt. Dabei zeigen wir, dass die Art und Weise der Darstellung, das sogenannte Parametermodel, enormen Einfluss sowohl auf die Präzision als auch auf die Laufzeit der gesamten Analyse hat. Wir erklären die Schwächen des alten Parametermodels, das auf Objektbäumen basiert, und präsentieren unsere Verbesserungen in Form eines neuen Modells mit Objektgraphen. Durch das gezielte Zusammenfassen von redundanten Informationen können wir die Anzahl der berechneten Parameterknoten deutlich reduzieren und zudem beschleunigen, ohne dabei die Präzision des resultierenden Abhängigkeitsgraphen zu verschlechtern. Bereits bei kleineren Programmen im Bereich von wenigen tausend Codezeilen erreichen wir eine im Schnitt 8-fach bessere Laufzeit — während die Präzision des Ergebnisses in der Regel verbessert wird. Bei größeren Programmen ist der Unterschied sogar noch deutlicher, was dazu führt, dass einige unserer Testfälle und alle von uns getesteten Programme ab einer Größe von 20000 Codezeilen nur noch mit Objektgraphen berechenbar sind. Dank dieser Verbesserungen kann Joana mit erhöhter Präzision und bei wesentlich größeren Programmen eingesetzt werden. Im zweiten Teil befassen wir uns mit dem Problem, dass bisherige, auf Abhängigkeitsgraphen basierende Sicherheitsanalysen nur vollständige Programme analysieren konnten. So war es z.B. unmöglich, Bibliothekscode ohne Kenntnis aller Verwendungsstellen zu betrachten oder vorzuverarbeiten. Wir entdeckten bei der bestehenden Analyse eine Monotonie-Eigenschaft, welche es uns erlaubt, Analyseergebnisse von Programmteilen auf beliebige Verwendungsstellen zu übertragen. So lassen sich zum einen Programmteile vorverarbeiten und zum anderen auch generelle Aussagen über die Sicherheitseigenschaften von Programmteilen treffen, ohne deren konkrete Verwendungsstellen zu kennen. Wir definieren die Monotonie-Eigenschaft im Detail und skizzieren einen Beweis für deren Korrektheit. Darauf aufbauend entwickeln wir eine Methode zur Vorverarbeitung von Programmteilen, die es uns ermöglicht, modulare Abhängigkeitsgraphen zu erstellen. Diese Graphen können zu einem späteren Zeitpunkt der jeweiligen Verwendungsstelle angepasst werden. Da die präzise Erstellung eines modularen Abhängigkeitsgraphen sehr aufwendig werden kann, entwickeln wir einen Algorithmus basierend auf sogenannten Zugriffspfaden, der die Skalierbarkeit verbessert. Zuletzt skizzieren wir einen Beweis, der zeigt, dass dieser Algorithmus tatsächlich immer eine konservative Approximation des modularen Graphen berechnet und deshalb die Ergebnisse darauf aufbauender Sicherheitsanalysen weiterhin gültig sind. Im dritten Teil präsentieren wir einige erfolgreiche Anwendungen von Joana, die im Rahmen einer Kooperation mit Ralf Küsters von der Universität Trier entstanden sind. Hier erklären wir zum einen, wie man unser Sicherheitswerkzeug Joana generell verwenden kann. Zum anderen zeigen wir, wie in Kombination mit weiteren Werkzeugen und Techniken kryptographische Sicherheit für ein Programm garantiert werden kann - eine Aufgabe, die bisher für auf Informationsfluss basierende Analysen nicht möglich war. In diesen Anwendungen wird insbesondere deutlich, wie die im Rahmen dieser Arbeit vereinfachte Bedienung die Verwendung von Joana erleichtert und unsere Verbesserungen der Präzision des Ergebnisses die erfolgreiche Analyse erst ermöglichen

KITopen

Identifying and exploiting concurrency in object-based real-time systems

Author: Yu Guohui
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1996
Field of study

The use of object-based mechanisms, i.e., abstract data types (ADTs), for constructing software systems can help to decrease development costs, increase understandability and increase maintainability. However, execution efficiency may be sacrificed due to the large number of procedure calls, and due to contention for shared ADTs in concurrent systems. Such inefficiencies are a concern in real-time applications that have stringent timing requirements. To address these issues, the potentially inefficient procedure calls are turned into a source of concurrency via asynchronous procedure calls (ARPCs), and contention for shared ADTS is reduced via ADT cloning. A framework for concurrency analysis in object-based systems is developed, and compiler techniques for identifying potential concurrency via ARPCs and cloning are introduced. Exploitation of the parallelizing compiler techniques is illustrated in the context of an incremental schedule construction algorithm that enhances concurrency incrementally so that feasible real-time schedules can be constructed. Experimental results show large speedup gains with these techniques. Additionally, experiments show that the concurrency enhancement techniques are often useful in constructing feasible schedules for hard real-time systems

Digital Commons @ New Jersey Institute of Technology (NJIT)

DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network

Author: Cheng X
Hua J
Sui Y
Wang H
Xu G
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/04/2022
Field of study

Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control-and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges

OPUS - University of Technology Sydney

Pattern-Based Vulnerability Discovery

Author: Yamaguchi Fabian
Publication venue
Publication date: 30/10/2015
Field of study

Georg-August-University Göttingen