60 research outputs found

    PROGRAM INSPECTION AND TESTING TECHNIQUES FOR CODE CLONES AND REFACTORINGS IN EVOLVING SOFTWARE

    Get PDF
    Developers often perform copy-and-paste activities. This practice causes the similar code fragment (aka code clones) to be scattered throughout a code base. Refactoring for clone removal is beneficial, preventing clones from having negative effects on software quality, such as hidden bug propagation and unintentional inconsistent changes. However, recent research has provided evidence that factoring out clones does not always reduce the risk of introducing defects, and it is often difficult or impossible to remove clones using standard refactoring techniques. To investigate which or how clones can be refactored, developers typically spend a significant amount of their time managing individual clone instances or clone groups scattered across a large code base. To address the problem, this research proposes two techniques to inspect and validate refactoring changes. First, we propose a technique for managing clone refactorings, Pattern-based clone Refactoring Inspection (PRI), using refactoring pattern templates. By matching the refactoring pattern templates against a code base, it summarizes refactoring changes of clones, and detects the clone instances not consistently factored out as potential anomalies. Second, we propose Refactoring Investigation and Testing technique, called RIT. RIT improves the testing efficiency for validating refactoring changes. RIT uses PRI to identify refactorings by analyzing original and edited versions of a program. It then uses the semantic impact of a set of identified refactoring changes to detect tests whose behavior may have been affected and modified by refactoring edits. Given each failed asserts, RIT helps developers focus their attention on logically related program statements by applying program slicing for minimizing each test. For debugging purposes, RIT determines specific failure-inducing refactoring edits, separating from other changes that only affect other asserts or tests

    apk2vec: Semi-supervised multi-view representation learning for profiling Android applications

    Full text link
    Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semi-supervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.Comment: International Conference on Data Mining, 201

    Slicing unconditional jumps with unnecessary control dependencies

    Full text link
    [EN] Program slicing is an analysis technique that has a wide range of applications, ranging from compilers to clone detection software, and that has been applied to practically all programming languages. Most program slicing techniques are based on a widely extended program representation, the System Dependence Graph (SDG). However, in the presence of unconditional jumps, there exist some situations where most SDG-based slicing techniques are not as accurate as possible, including more code than strictly necessary. In this paper, we identify one of these scenarios, pointing out the cause of the inaccuracy, and describing the initial solution to the problem proposed in the literature, together with an extension, which solves the problem completely. These solutions modify both the SDG generation and the slicing algorithm. Additionally, we propose an alternative solution, that solves the problem by modifying only the SDG generation, leaving the slicing algorithm untouched.This work has been partially supported by the EU (FEDER) and the Spanish MCI/AEI under grants TIN2016-76843-C4-1-R and PID2019-104735RB-C41, by the Generalitat Valenciana under grant Prometeo/2019/098 (DeepTrust), and by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215Galindo-Jiménez, CS.; Pérez-Rubio, S.; Silva, J. (2021). Slicing unconditional jumps with unnecessary control dependencies. Lecture Notes in Computer Science. 12561:293-308. https://doi.org/10.1007/978-3-030-68446-4_15S2933081256

    Scalable detection of semantic clones

    Get PDF
    Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments

    Information Flow Control with System Dependence Graphs - Improving Modularity, Scalability and Precision for Object Oriented Languages

    Get PDF
    Die vorliegende Arbeit befasst sich mit dem Gebiet der statischen Programmanalyse — insbesondere betrachten wir Analysen, deren Ziel es ist, bestimmte Sicherheitseigenschaften, wie etwa IntegritĂ€t und Vertraulichkeit, fĂŒr Programme zu garantieren. HierfĂŒr verwenden wir sogenannte AbhĂ€ngigkeitsgraphen, welche das potentielle Verhalten des Programms sowie den Informationsfluss zwischen einzelnen Programmpunkten abbilden. Mit Hilfe dieser Technik können wir sicherstellen, dass z.B. ein Programm keinerlei Information ĂŒber ein geheimes Passwort preisgibt. Im Speziellen liegt der Fokus dieser Arbeit auf Techniken, die das Erstellen des AbhĂ€ngigkeitsgraphen verbessern, da dieser die Grundlage fĂŒr viele weiterfĂŒhrende Sicherheitsanalysen bildet. Die vorgestellten Algorithmen und Verbesserungen wurden in unser Analysetool Joana integriert und als Open-Source öffentlich verfĂŒgbar gemacht. Zahlreiche Kooperationen und Veröffentlichungen belegen, dass die Verbesserungen an Joana auch in der Forschungspraxis relevant sind. Diese Arbeit besteht im Wesentlichen aus drei Teilen. Teil 1 befasst sich mit Verbesserungen bei der Berechnung des AbhĂ€ngigkeitsgraphen, Teil 2 stellt einen neuen Ansatz zur Analyse von unvollstĂ€ndigen Programmen vor und Teil 3 zeigt aktuelle Verwendungsmöglichkeiten von Joana an konkreten Beispielen. Im ersten Teil gehen wir detailliert auf die Algorithmen zum Erstellen eines AbhĂ€ngigkeitsgraphen ein, dabei legen wir besonderes Augenmerk auf die Probleme und Herausforderung bei der Analyse von Objektorientierten Sprachen wie Java. So stellen wir z.B. eine Analyse vor, die den durch Exceptions ausgelösten Kontrollfluss prĂ€zise behandeln kann. HauptsĂ€chlich befassen wir uns mit der Modellierung von Seiteneffekten, die bei der Kommunikation ĂŒber Methodengrenzen hinweg entstehen können. Bei AbhĂ€ngigkeitsgraphen werden Seiteneffekte, also Speicherstellen, die von einer Methode gelesen oder verĂ€ndert werden, in Form von zusĂ€tzlichen Knoten dargestellt. Dabei zeigen wir, dass die Art und Weise der Darstellung, das sogenannte Parametermodel, enormen Einfluss sowohl auf die PrĂ€zision als auch auf die Laufzeit der gesamten Analyse hat. Wir erklĂ€ren die SchwĂ€chen des alten Parametermodels, das auf ObjektbĂ€umen basiert, und prĂ€sentieren unsere Verbesserungen in Form eines neuen Modells mit Objektgraphen. Durch das gezielte Zusammenfassen von redundanten Informationen können wir die Anzahl der berechneten Parameterknoten deutlich reduzieren und zudem beschleunigen, ohne dabei die PrĂ€zision des resultierenden AbhĂ€ngigkeitsgraphen zu verschlechtern. Bereits bei kleineren Programmen im Bereich von wenigen tausend Codezeilen erreichen wir eine im Schnitt 8-fach bessere Laufzeit — wĂ€hrend die PrĂ€zision des Ergebnisses in der Regel verbessert wird. Bei grĂ¶ĂŸeren Programmen ist der Unterschied sogar noch deutlicher, was dazu fĂŒhrt, dass einige unserer TestfĂ€lle und alle von uns getesteten Programme ab einer GrĂ¶ĂŸe von 20000 Codezeilen nur noch mit Objektgraphen berechenbar sind. Dank dieser Verbesserungen kann Joana mit erhöhter PrĂ€zision und bei wesentlich grĂ¶ĂŸeren Programmen eingesetzt werden. Im zweiten Teil befassen wir uns mit dem Problem, dass bisherige, auf AbhĂ€ngigkeitsgraphen basierende Sicherheitsanalysen nur vollstĂ€ndige Programme analysieren konnten. So war es z.B. unmöglich, Bibliothekscode ohne Kenntnis aller Verwendungsstellen zu betrachten oder vorzuverarbeiten. Wir entdeckten bei der bestehenden Analyse eine Monotonie-Eigenschaft, welche es uns erlaubt, Analyseergebnisse von Programmteilen auf beliebige Verwendungsstellen zu ĂŒbertragen. So lassen sich zum einen Programmteile vorverarbeiten und zum anderen auch generelle Aussagen ĂŒber die Sicherheitseigenschaften von Programmteilen treffen, ohne deren konkrete Verwendungsstellen zu kennen. Wir definieren die Monotonie-Eigenschaft im Detail und skizzieren einen Beweis fĂŒr deren Korrektheit. Darauf aufbauend entwickeln wir eine Methode zur Vorverarbeitung von Programmteilen, die es uns ermöglicht, modulare AbhĂ€ngigkeitsgraphen zu erstellen. Diese Graphen können zu einem spĂ€teren Zeitpunkt der jeweiligen Verwendungsstelle angepasst werden. Da die prĂ€zise Erstellung eines modularen AbhĂ€ngigkeitsgraphen sehr aufwendig werden kann, entwickeln wir einen Algorithmus basierend auf sogenannten Zugriffspfaden, der die Skalierbarkeit verbessert. Zuletzt skizzieren wir einen Beweis, der zeigt, dass dieser Algorithmus tatsĂ€chlich immer eine konservative Approximation des modularen Graphen berechnet und deshalb die Ergebnisse darauf aufbauender Sicherheitsanalysen weiterhin gĂŒltig sind. Im dritten Teil prĂ€sentieren wir einige erfolgreiche Anwendungen von Joana, die im Rahmen einer Kooperation mit Ralf KĂŒsters von der UniversitĂ€t Trier entstanden sind. Hier erklĂ€ren wir zum einen, wie man unser Sicherheitswerkzeug Joana generell verwenden kann. Zum anderen zeigen wir, wie in Kombination mit weiteren Werkzeugen und Techniken kryptographische Sicherheit fĂŒr ein Programm garantiert werden kann - eine Aufgabe, die bisher fĂŒr auf Informationsfluss basierende Analysen nicht möglich war. In diesen Anwendungen wird insbesondere deutlich, wie die im Rahmen dieser Arbeit vereinfachte Bedienung die Verwendung von Joana erleichtert und unsere Verbesserungen der PrĂ€zision des Ergebnisses die erfolgreiche Analyse erst ermöglichen

    Identifying and exploiting concurrency in object-based real-time systems

    Get PDF
    The use of object-based mechanisms, i.e., abstract data types (ADTs), for constructing software systems can help to decrease development costs, increase understandability and increase maintainability. However, execution efficiency may be sacrificed due to the large number of procedure calls, and due to contention for shared ADTs in concurrent systems. Such inefficiencies are a concern in real-time applications that have stringent timing requirements. To address these issues, the potentially inefficient procedure calls are turned into a source of concurrency via asynchronous procedure calls (ARPCs), and contention for shared ADTS is reduced via ADT cloning. A framework for concurrency analysis in object-based systems is developed, and compiler techniques for identifying potential concurrency via ARPCs and cloning are introduced. Exploitation of the parallelizing compiler techniques is illustrated in the context of an incremental schedule construction algorithm that enhances concurrency incrementally so that feasible real-time schedules can be constructed. Experimental results show large speedup gains with these techniques. Additionally, experiments show that the concurrency enhancement techniques are often useful in constructing feasible schedules for hard real-time systems

    DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network

    Full text link
    Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control-and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges
    • 

    corecore