10 research outputs found

    Heap Abstractions for Static Analysis

    Full text link
    Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

    Template-based verification of heap-manipulating programs

    Get PDF
    We propose a shape analysis suitable for analysis engines that perform automatic invariant inference using an SMT solver. The proposed solution includes an abstract template domain that encodes the shape of a program heap based on logical formulae over bit-vectors. It is based on a points-to relation between pointers and symbolic addresses of abstract memory objects. Our abstract heap domain can be combined with value domains in a straight-forward manner, which particularly allows us to reason about shapes and contents of heap structures at the same time. The information obtained from the analysis can be used to prove reachability and memory safety properties of programs manipulating dynamic data structures, mainly linked lists. The solution has been implemented in 2LS and compared against state-of-the-art tools that perform the best in heap-related categories of the well-known Software Verification Competition (SV-COMP). Results show that 2LS outperforms these tools on benchmarks requiring combined reasoning about unbounded data structures and their numerical contents

    Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

    Full text link
    Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs

    A Sound Flow-Sensitive Heap Abstraction for the Static Analysis of Android Applications

    Get PDF
    The present paper proposes the first static analysis for Android applications which is both flow-sensitive on the heap abstraction and provably sound with respect to a rich formal model of the Android platform. We formulate the analysis as a set of Horn clauses defining a sound over-approximation of the semantics of the Android application to analyse, borrowing ideas from recency abstraction and extending them to our concurrent setting. Moreover, we implement the analysis in HornDroid, a state-of-the-art information flow analyser for Android applications. Our extension allows HornDroid to perform strong updates on heap-allocated data structures, thus significantly increasing its precision, without sacrificing its soundness guarantees. We test our implementation on DroidBench, a popular benchmark of Android applications developed by the research community, and we show that our changes to HornDroid lead to an improvement in the precision of the tool, while having only a moderate cost in terms of efficiency. Finally, we assess the scalability of our tool to the analysis of real applications

    Precision-guided context sensitivity for pointer analysis

    Get PDF
    Context sensitivity is an essential technique for ensuring high precision in Java pointer analyses. It has been observed that applying context sensitivity partially, only on a select subset of the methods, can improve the balance between analysis precision and speed. However, existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset. In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis. Accordingly, we provide an efficient algorithm to recognize these flow patterns in a given program and exploit them to yield good tradeoffs between analysis precision and speed. Our experimental results on standard benchmark and real-world programs show that a pointer analysis that applies context sensitivity partially, only on the identified precision-critical methods, preserves effectively all (98.8%) of the precision of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap), with a substantial speedup (on average 3.4X, and up to 9.2X)

    Security analyses for detecting deserialisation vulnerabilities : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    An important task in software security is to identify potential vulnerabilities. Attackers exploit security vulnerabilities in systems to obtain confidential information, to breach system integrity, and to make systems unavailable to legitimate users. In recent years, particularly 2012, there has been a rise in reported Java vulnerabilities. One type of vulnerability involves (de)serialisation, a commonly used feature to store objects or data structures to an external format and restore them. In 2015, a deserialisation vulnerability was reported involving Apache Commons Collections, a popular Java library, which affected numerous Java applications. Another major deserialisation-related vulnerability that affected 55\% of Android devices was reported in 2015. Both of these vulnerabilities allowed arbitrary code execution on vulnerable systems by malicious users, a serious risk, and this came as a call for the Java community to issue patches to fix serialisation related vulnerabilities in both the Java Development Kit and libraries. Despite attention to coding guidelines and defensive strategies, deserialisation remains a risky feature and a potential weakness in object-oriented applications. In fact, deserialisation related vulnerabilities (both denial-of-service and remote code execution) continue to be reported for Java applications. Further, deserialisation is a case of parsing where external data is parsed from their external representation to a program's internal data structures and hence, potentially similar vulnerabilities can be present in parsers for file formats and serialisation languages. The problem is, given a software package, to detect either injection or denial-of-service vulnerabilities and propose strategies to prevent attacks that exploit them. The research reported in this thesis casts detecting deserialisation related vulnerabilities as a program analysis task. The goal is to automatically discover this class of vulnerabilities using program analysis techniques, and to experimentally evaluate the efficiency and effectiveness of the proposed methods on real-world software. We use multiple techniques to detect reachability to sensitive methods and taint analysis to detect if untrusted user-input can result in security violations. Challenges in using program analysis for detecting deserialisation vulnerabilities include addressing soundness issues in analysing dynamic features in Java (e.g., native code). Another hurdle is that available techniques mostly target the analysis of applications rather than library code. In this thesis, we develop techniques to address soundness issues related to analysing Java code that uses serialisation, and we adapt dynamic techniques such as fuzzing to address precision issues in the results of our analysis. We also use the results from our analysis to study libraries in other languages, and check if they are vulnerable to deserialisation-type attacks. We then provide a discussion on mitigation measures for engineers to protect their software against such vulnerabilities. In our experiments, we show that we can find unreported vulnerabilities in Java code; and how these vulnerabilities are also present in widely-used serialisers for popular languages such as JavaScript, PHP and Rust. In our study, we discovered previously unknown denial-of-service security bugs in applications/libraries that parse external data formats such as YAML, PDF and SVG

    Recovering Structural Information for Better Static Analysis

    Get PDF
    Η στατική ανάλυση στοχεύει στην κατανόηση της συμπεριφοράς του προγράμματος, μέσω αυτοματοποιημένων τεχνικών συμπερασμού που βασίζονται καθαρά στον πηγαίο κώδικα του προγράμματος, αλλά δεν προϋποθέτουν την εκτέλεσή του. Για να πετύχουν αυτές οι τεχνικές μία ευρεία κατανόηση του κώδικα, καταφεύγουν στη δημιουργία ενός αφηρημένου μοντέλου της μνήμης, το οποίο καλύπτει όλες τις πιθανές εκτελέσεις. Αφηρημένα μοντέλα τέτοιου τύπου μπορεί γρήγορα να εκφυλιστούν, αν χάσουν σημαντική δομική πληροφορία των αντικειμένων στη μνήμη που περιγράφουν. Αυτό συνήθως συμβαίνει λόγω χρήσης συγκεκριμένων προγραμματιστικών ιδιωμάτων και χαρακτηριστικών της γλώσσας προγραμματισμού, ή λόγω πρακτικών περιορισμών της ανάλυσης. Σε αρκετές περιπτώσεις, ένα σημαντικό μέρος της χαμένης αυτής δομικής πληροφορίας μπορεί να ανακτηθεί μέσω σύνθετης λογικής, η οποία παρακολουθεί την έμμεση χρήση τύπων, και να χρησιμοποιηθεί προς όφελος της στατικής ανάλυσης του προγράμματος. Στη διατριβή αυτή παρουσιάζουμε διάφορους τρόπους ανάκτησης δομικής πληροφορίας, πρώτα (1) σε προγράμματα C/C++, κι έπειτα, σε προγράμματα γλωσσών υψηλότερου επιπέδου που δεν προσφέρουν άμεση πρόσβαση μνήμης, όπως η Java, όπου αναγνωρίζουμε δύο βασικές πηγές απώλειας δομικής πληροφορίας: (2) χρήση ανάκλασης και (3) ανάλυση μερικών προγραμμάτων. Δείχνουμε πως, σε όλες τις παραπάνω περιπτώσεις, η ανάκτηση τέτοιας δομικής πληροφορίας βελτιώνει άμεσα τη στατική ανάλυση του προγράμματος. Παρουσιάζουμε μία ανάλυση δεικτών για C/C++, η οποία βελτιώνει το επίπεδο της αφαίρεσης, βασιζόμενη σε πληροφορία τύπου που ανακαλύπτει κατά τη διάρκεια της ανάλυσης. Παρέχουμε μία υλοποίηση της ανάλυσης αυτής, στο cclyzer, ένα εργαλείο στατικής ανάλυσης για LLVM bitcode. Έπειτα, παρουσιάζουμε επεκτάσεις σε ανάλυση δεικτών για Java, κτίζοντας πάνω σε σύγχρονες τεχνικές χειρισμού μηχανισμών ανάκλασης. Η βασική αρχή είναι παραπλήσια με την περίπτωση της C/C++: καταγράφουμε τη χρήση των ανακλαστικών αντικειμένων, κατά τη διάρκεια της ανάλυσης δεικτών, ώστε να ανακαλύψουμε βασικά δομικά τους στοιχεία, τα οποία μπορούμε να χρησιμοποιήσουμε έπειτα για να βελτιώσουμε τον χειρισμό των εντολών ανάκλασης στην τρέχουσα ανάλυση, με αμοιβαία αναδρομικό τρόπο. Τέλος, ως προς την ανάλυση μερικών προγραμμάτων Java, ορίζουμε το γενικό πρόβλημα της ((συμπλήρωσης προγράμματος)): δοθέντος ενός μερικού προγράμματος, πως να εφεύρουμε ένα υποκατάστατο του κώδικα που λείπει, έτσι ώστε αυτό να ικανοποιεί τους περιορισμούς των στατικών και δυναμικών τύπων που υπονοούνται από τον υπάρχοντα κώδικα. Ή διαφορετικά, πως να ανακτήσουμε τη δομή των τύπων που λείπουν. Πέραν της ανακάλυψης των μελών (πεδίων και μεθόδων) των κλάσεων που λείπουν, η ικανοποίηση των περιορισμών υποτυπισμού μας οδηγεί στον ορισμό ενός πρωτότυπου αλγοριθμικού προβλήματος: τη συμπλήρωση ιεραρχίας τύπων. Παρέχουμε αλγορίθμους που λύνουν το πρόβλημα αυτό σε διάφορα είδη κληρονομικότητας (μονής, πολλαπλής, μεικτής) και τους υλοποιούμε στο JPhantom, ένα νέο εργαλείο συμπλήρωσης Java bytecode κώδικα.Static analysis aims to achieve an understanding of program behavior, by means of automatic reasoning that requires only the program’s source code and not any actual execution. To reach a truly broad level of program understanding, static analysis techniques need to create an abstraction of memory that covers all possible executions. Such abstract models may quickly degenerate after losing essential structural information about the memory objects they describe, due to the use of specific programming idioms and language features, or because of practical analysis limitations. In many cases, some of the lost memory structure may be retrieved, though it requires complex inference that takes advantage of indirect uses of types. Such recovered structural information may, then, greatly benefit static analysis. This dissertation shows how we can recover structural information, first (i) in the context of C/C++, and next, in the context of higher-level languages without direct memory access, like Java, where we identify two primary causes of losing memory structure: (ii) the use of reflection, and (iii) analysis of partial programs. We show that, in all cases, the recovered structural information greatly benefits static analysis on the program. For C/C++, we introduce a structure-sensitive pointer analysis that refines its abstraction based on type information that it discovers on-they-fly. This analysis is implemented in cclyzer, a static analysis tool for LLVM bitcode. Next, we present techniques that extend a standard Java pointer analysis by building on top of state-of-the-art handling of reflection. The principle is similar to that of our structure-sensitive analysis for C/C++: track the use of reflective objects, during pointer analysis, to gain important insights on their structure, which can be used to “patch” the handling of reflective operations on the running analysis, in a mutually recursive fashion. Finally, to address the challenge of analyzing partial Java programs in full generality, we define the problem of “program complementation”: given a partial program we seek to provide definitions for its missing parts so that the “complement” satisfies all static and dynamic typing requirements induced by the code under analysis. Essentially, complementation aims to recover the structure of phantom types. Apart from discovering missing class members (i.e., fields and methods), satisfying the subtyping constraints leads to the formulation of a novel typing problem in the OO context, regarding type hierarchy complementation. We offer algorithms to solve this problem in various inheritance settings, and implement them in JPhantom, a practical tool for Java bytecode complementation

    Heap Abstractions for Static Analysis

    No full text
    Heap data is potentially unbounded and seemingly arbitrary. Hence, unlike stack and static data, heap data cannot be abstracted in terms of a fixed set of program variables. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, insights gained in one description of heap abstraction may not directly carry over to some other description. In our search of a unified theme, we view heap abstraction as consisting of two steps: (a) heap modelling, which is the process of representing a heap memory (i.e., an unbounded set of concrete locations) as a heap model (i.e., an unbounded set of abstract locations), and (b) summarization, which is the process of bounding the heap model by merging multiple abstract locations into summary locations. We classify the heap models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves the way for creating new abstractions by mix and match of models and summarization techniques
    corecore