50 research outputs found

    Heap Abstractions for Static Analysis

    Full text link
    Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

    Automated Verification of Complete Specification with Shape Inference

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Trusting Computations: a Mechanized Proof from Partial Differential Equations to Actual Program

    Get PDF
    Computer programs may go wrong due to exceptional behaviors, out-of-bound array accesses, or simply coding errors. Thus, they cannot be blindly trusted. Scientific computing programs make no exception in that respect, and even bring specific accuracy issues due to their massive use of floating-point computations. Yet, it is uncommon to guarantee their correctness. Indeed, we had to extend existing methods and tools for proving the correct behavior of programs to verify an existing numerical analysis program. This C program implements the second-order centered finite difference explicit scheme for solving the 1D wave equation. In fact, we have gone much further as we have mechanically verified the convergence of the numerical scheme in order to get a complete formal proof covering all aspects from partial differential equations to actual numerical results. To the best of our knowledge, this is the first time such a comprehensive proof is achieved.Comment: N° RR-8197 (2012). arXiv admin note: text overlap with arXiv:1112.179

    Empirically-Grounded Construction of Bug Prediction and Detection Tools

    Get PDF
    There is an increasing demand on high-quality software as software bugs have an economic impact not only on software projects, but also on national economies in general. Software quality is achieved via the main quality assurance activities of testing and code reviewing. However, these activities are expensive, thus they need to be carried out efficiently. Auxiliary software quality tools such as bug detection and bug prediction tools help developers focus their testing and reviewing activities on the parts of software that more likely contain bugs. However, these tools are far from adoption as mainstream development tools. Previous research points to their inability to adapt to the peculiarities of projects and their high rate of false positives as the main obstacles of their adoption. We propose empirically-grounded analysis to improve the adaptability and efficiency of bug detection and prediction tools. For a bug detector to be efficient, it needs to detect bugs that are conspicuous, frequent, and specific to a software project. We empirically show that the null-related bugs fulfill these criteria and are worth building detectors for. We analyze the null dereferencing problem and find that its root cause lies in methods that return null. We propose an empirical solution to this problem that depends on the wisdom of the crowd. For each API method, we extract the nullability measure that expresses how often the return value of this method is checked against null in the ecosystem of the API. We use nullability to annotate API methods with nullness annotation and warn developers about missing and excessive null checks. For a bug predictor to be efficient, it needs to be optimized as both a machine learning model and a software quality tool. We empirically show how feature selection and hyperparameter optimizations improve prediction accuracy. Then we optimize bug prediction to locate the maximum number of bugs in the minimum amount of code by finding the most cost-effective combination of bug prediction configurations, i.e., dependent variables, machine learning model, and response variable. We show that using both source code and change metrics as dependent variables, applying feature selection on them, then using an optimized Random Forest to predict the number of bugs results in the most cost-effective bug predictor. Throughout this thesis, we show how empirically-grounded analysis helps us achieve efficient bug prediction and detection tools and adapt them to the characteristics of each software project

    A Formal Verification Environment for Use in the Certification of Safety-Related C Programs

    Get PDF
    In this thesis the design of an environment for the formal verification of functional properties of safety-related software written in the programming language C is described. The focus lies on the verification of (primarily) geometric computations. We give an overview of the applicable regulations for safety-related software systems. We define a combination of higher-order logic as formalised in the theorem prover Isabelle and a specification language syntactically based on C expressions. The language retains the mathematical character of higher-level specifications in code specifications. A memory model for C is formalised which is appropriate to model low-level memory operations while keeping the entailed verification overhead in tolerable bounds. Finally, a Hoare style proof calculus is devised so that correctness proofs can be performed in one integrated framework. The applicability of the approach is demonstrated by describing its use in an industrial project

    Memory Leak Analysis by Contradiction

    Full text link
    We present a novel leak detection algorithm. To prove the absence of a memory leak, the algorithm assumes its presence and runs a backward heap analysis to disprove this assumption. We have implemented this approach in a memory leak analysis tool and used it to analyze several routines that manipulate linked lists and trees. Because of the reverse nature of the algorithm, the analysis can locally reason about the absence of memory leaks. We have also used the tool as a scalable, but unsound leak detector for C programs. The tool has found several bugs in larger programs from the SPEC2000 suite

    Verification of Pointer-Based Programs with Partial Information

    Get PDF
    The proliferation of software across all aspects of people's life means that software failure can bring catastrophic result. It is therefore highly desirable to be able to develop software that is verified to meet its expected specification. This has also been identified as a key objective in one of the UK Grand Challenges (GC6) (Jones et al., 2006; Woodcock, 2006). However, many difficult problems still remain in achieving this objective, partially due to the wide use of (recursive) shared mutable data structures which are hard to keep track of statically in a precise and concise way. This thesis aims at building a verification system for both memory safety and functional correctness of programs manipulating pointer-based data structures, which can deal with two scenarios where only partial information about the program is available. For instance the verifier may be supplied with only partial program specification, or with full specification but only part of the program code. For the first scenario, previous state-of-the-art works (Nguyen et al., 2007; Chin et al., 2007; Nguyen and Chin, 2008; Chin et al, 2010) generally require users to provide full specifications for each method of the program to be verified. Their approach seeks much intellectual effort from users, and meanwhile users are liable to make mistakes in writing such specifications. This thesis proposes a new approach to program verification that allows users to provide only partial specification to methods. Our approach will then refine the given annotation into a more complete specification by discovering missing constraints. The discovered constraints may involve both numerical and multiset properties that could be later confirmed or revised by users. Meanwhile, we further augment our approach by requiring only partial specification to be given for primary methods of a program. Specifications for loops and auxiliary methods can then be systematically discovered by our augmented mechanism, with the help of information propagated from the primary methods. This work is aimed at verifying beyond shape properties, with the eventual goal of analysing both memory safety and functional properties for pointer-based data structures. Initial experiments have confirmed that we can automatically refine partial specifications with non-trivial constraints, thus making it easier for users to handle specifications with richer properties. For the second scenario, many programs contain invocations to unknown components and hence only part of the program code is available to the verifier. As previous works generally require the whole of program code be present, we target at the verification of memory safety and functional correctness of programs manipulating pointer-based data structures, where the program code is only partially available due to invocations to unknown components. Provided with a Hoare-style specification ({Pre} prog {Post}) where program (prog) contains calls to some unknown procedure (unknown), we infer a specification (mspecu) for the unknown part (unknown) from the calling contexts, such that the problem of verifying program (prog) can be safely reduced to the problem of proving that the unknown procedure (unknown) (once its code is available) meets the derived specification (mspecu). The expected specification (mspecu) is automatically calculated using an abduction-based shape analysis specifically designed for a combined abstract domain. We have implemented a system to validate the viability of our approach, with encouraging experimental results

    Program analysis of temporal memory mismanagement

    Full text link
    In the use of C/C++ programs, the performance benefits obtained from flexible low-level memory access and management sacrifice language-level support for memory safety and garbage collection. Memory-related programming mistakes are introduced as a result, rendering C/C++ programs prone to memory errors. A common category of programming mistakes is defined by the misplacement of deallocation operations, also known as temporal memory mismanagement, which can generate two types of bugs: (1) use-after-free (UAF) bugs and (2) memory leaks. The former are severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise software availability and reliability. In the case of UAF bugs, existing solutions that almost exclusively rely on dynamic analysis suffer from limitations, including low code coverage, binary incompatibility, and high overheads. In the case of memory leaks, detection techniques are abundant; however, fixing techniques have been poorly investigated. In this thesis, we present three novel program analysis frameworks to address temporal memory mismanagement in C/C++. First, we introduce Tac, the first static UAF detection framework to combine typestate analysis with machine learning. Tac identifies representative features to train a Support Vector Machine to classify likely true/false UAF candidates, thereby providing guidance for typestate analysis used to locate bugs with precision. We then present CRed, a pointer analysis-based framework for UAF detection with a novel context-reduction technique and a new demand-driven path-sensitive pointer analysis to boost scalability and precision. A major advantage of CRed is its ability to substantially and soundly reduce search space without losing bug-finding ability. This is achieved by utilizing must-not-alias information to truncate unnecessary segments of calling contexts. Finally, we propose AutoFix, an automated memory leak fixing framework based on value-flow analysis and static instrumentation that can fix all leaks reported by any front-end detector with negligible overheads safely and with precision. AutoFix tolerates false leaks with a shadow memory data structure carefully designed to keep track of the allocation and deallocation of potentially leaked memory objects. The contribution of this thesis is threefold. First, we advance existing state-of-the-art solutions to detecting memory leaks by proposing a series of novel program analysis techniques to address temporal memory mismanagement. Second, corresponding prototype tools are fully implemented in the LLVM compiler framework. Third, an extensive evaluation of open-source C/C++ benchmarks is conducted to validate the effectiveness of the proposed techniques
    corecore