124 research outputs found

    Analyzing Array Manipulating Programs by Program Transformation

    Get PDF
    We explore a transformational approach to the problem of verifying simple array-manipulating programs. Traditionally, verification of such programs requires intricate analysis machinery to reason with universally quantified statements about symbolic array segments, such as "every data item stored in the segment A[i] to A[j] is equal to the corresponding item stored in the segment B[i] to B[j]." We define a simple abstract machine which allows for set-valued variables and we show how to translate programs with array operations to array-free code for this machine. For the purpose of program analysis, the translated program remains faithful to the semantics of array manipulation. Based on our implementation in LLVM, we evaluate the approach with respect to its ability to extract useful invariants and the cost in terms of code size

    A scalable mixed-level approach to dynamic analysis of C and C++ programs

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 107-112).This thesis addresses the difficult task of constructing robust and scalable dynamic program analysis tools for programs written in memory-unsafe languages such as C and C++, especially those that are interested in observing the contents of data structures at run time. In this thesis, I first introduce my novel mixed-level approach to dynamic analysis, which combines the advantages of both source- and binary-based approaches. Second, I present a tool framework that embodies the mixed-level approach. This framework provides memory safety guarantees, allows tools built upon it to access rich source- and binary-level information simultaneously at run time, and enables tools to scale to large, real-world C and C++ programs on the order of millions of lines of code. Third, I present two dynamic analysis tools built upon my framework - one for performing value profiling and the other for performing dynamic inference of abstract types - and describe how they far surpass previous analyses in terms of scalability, robustness, and applicability. Lastly, I present several case studies demonstrating how these tools aid both humans and automated tools in several program analysis tasks: improving human understanding of unfamiliar code, invariant detection, and data structure repair.by Philip Jia Guo.M.Eng

    Termination, correctness and relative correctness

    Get PDF
    Over the last decade, research in verification and formal methods has been the subject of increased interest with the need of more secure and dependable software. At the heart of software dependability is the concept of software fault, defined in the literature as the adjudged or hypothesized cause of an error. This definition, which lacks precision, presents at least two challenges with regard to using formal methods: (1) Adjudging and hypothesizing are highly subjective human endeavors; (2) The concept of error is itself insufficiently defined, since it depends on a detailed characterization of correct system states at each stage of a computation (which is usually unavailable). In the process of defining what a software fault is, the concept of relative correctness, the property of a program to be more-correct than another with respect to a given specification, is discussed. Subsequently, a feature of a program is a fault (for a given specification) only because there exists an alternative to it that would make the program more-correct with respect to the specification. Furthermore, the implications and applications of relative correctness in various software engineering activities are explored. It is then illustrated that in many situations of software testing, fault removal and program repair, testing for relative correctness rather than absolute correctness leads to clearer conclusions and better outcomes. In particular, debugging without testing, a technique whereby, a fault can be removed from a program and the new program proven to be more-correct than the original, all without any testing (and its associated uncertainties/imperfections) is introduced. Given that there are orders of magnitude more incorrect programs than correct programs in use nowadays, this has the potential to expand the scope of proving methods significantly. Another technique, programming without refining, is also introduced. The most important advantage of program derivation by correctness enhancement is that it captures not only program construction from scratch, but also virtually all activities of software evolution. Given that nowadays most software is developed by evolving existing assets rather than producing new assets from scratch, the paradigm of software evolution by correctness enhancements stands to yield significant gains, if we can make it practical

    Automatic detection of safety and security vulnerabilities in open source software

    Get PDF
    Growing software quality requirements have raised the stakes on software safety and security. Building secure software focuses on techniques and methodologies of design and implementation in order to avoid exploitable vulnerabilities. Unfortunately, coding errors have become common with the inexorable growth tendency of software size and complexity. According to the US National Institute of Standards and Technology (NIST), these coding errors lead to vulnerabilities that cost the US economy $60 billion each year. Therefore, tracking security and safety errors is considered as a fundamental cornerstone to deliver software that are free from severe vulnerabilities. The main objective of this thesis is the elaboration of efficient, rigorous, and practical techniques for the safety and security evaluation of source code. To tackle safety errors related to the misuse of type and memory operations, we present a novel type and effect discipline that extends the standard C type system with safety annotations and static safety checks. We define an inter-procedural, flow-sensitive, and alias-sensitive inference algorithm that automatically propagates type annotations and applies safety checks to programs without programmers' interaction. Moreover, we present a dynamic semantics of our C core language that is compliant with the ANSI C standard. We prove the consistency of the static semantics with respect to the dynamic semantics. We show the soundness of our static analysis in detecting our targeted set of safety errors. To tackle system-specific security properties, we present a security verification framework that combines static analysis and model-checking. We base our approach on the GCC compiler and its GIMPLE representation of source code to extract model-checkable abstractions of programs. For the verification process, we use an off-the-shelf pushdown system model-checker, and turn it into a fully-fledged security verification framework. We also allow programmers to define a wide range of security properties using an automata-based specification approach. To demonstrate the efficiency and the scalability of our approach, we conduct extensive experiments and case studies on large scale open-source software to verify their compliance with a representative set of the CERT standard secure coding rules

    Runtime Enforcement of Memory Safety for the C Programming Language

    Get PDF
    Memory access violations are a leading source of unreliability in C programs. Although the low-level features of the C programming language, like unchecked pointer arithmetic and explicit memory management, make it a desirable language for many programming tasks, their use often results in hard-to-detect memory errors. As evidence of this problem, a variety of methods exist for retrofitting C with software checks to detect memory errors at runtime. However, these techniques generally suffer from one or more practical drawbacks that have thus far limited their adoption. These weaknesses include the inability to detect all spatial and temporal violations, the use of incompatible metadata, the need for manual code modifications, and the tremendous runtime cost of providing complete safety. This dissertation introduces MemSafe, a compiler analysis and transformation for ensuring the memory safety of C programs at runtime while avoiding the above drawbacks. MemSafe makes several novel contributions that improve upon previous work and lower the runtime cost of achieving memory safety. These include (1) a method for modeling temporal errors as spatial errors, (2) a hybrid metadata representation that combines the most salient features of both object- and pointer-based approaches, and (3) a data-flow representation that simplifies optimizations for removing unneeded checks and unused metadata. Experimental results indicate that MemSafe is capable of detecting memory safety violations in real-world programs with lower runtime overhead than previous methods. Results show that MemSafe detects all known memory errors in multiple versions of two large and widely-used open source applications as well as six programs from a benchmark suite specifically designed for the evaluation of error detection tools. MemSafe enforces complete safety with an average overhead of 88% on 30 widely-used performance evaluation benchmarks. In comparison with previous work, MemSafe's average runtime overhead for one common benchmark suite (29%) is a fraction of that associated with the previous technique (133%) that, until now, had the lowest overhead among all existing complete and automatic methods that are capable of detecting both spatial and temporal violations

    Safe and automatic live update

    Get PDF
    Tanenbaum, A.S. [Promotor

    Quantitative Static Analysis of Communication Protocols using Abstract Markov Chains

    Get PDF
    International audienceIn this paper we present a static analysis of probabilistic programs to quantify their performance properties by taking into account both the stochastic aspects of the language and those related to the execution environment. More particularly, we are interested in the analysis of communication protocols in lossy networks and we aim at inferring statically parametric bounds of some important metrics such as the expectation of the throughput or the energy consumption. Our analysis is formalized within the theory of abstract interpretation and soundly takes all possible executions into account. We model the concrete executions as a set of Markov chains and we introduce a novel notion of abstract Markov chains that provides a finite and symbolic representation to over-approximate the (possi-bly unbounded) set of concrete behaviors. We show that our proposed formalism is expressive enough to handle both probabilistic and pure non-deterministic choices within the same semantics. Our analysis operates in two steps. The first step is a classic abstract interpretation of the source code, using stock numerical abstract domains and a specific automata domain, in order to extract the abstract Markov chain of the program. The second step extracts from this chain particular invari-ants about the stationary distribution and computes its symbolic bounds using a parametric Fourier-Motzkin elimination algorithm. We present a prototype implementation of the analysis and we discuss some preliminary experiments on a number of communication protocols. We compare our prototype to the state-of-the-art probabilistic model checker Prism and we highlight the advantages and shortcomings of both approaches

    Program analysis of temporal memory mismanagement

    Full text link
    In the use of C/C++ programs, the performance benefits obtained from flexible low-level memory access and management sacrifice language-level support for memory safety and garbage collection. Memory-related programming mistakes are introduced as a result, rendering C/C++ programs prone to memory errors. A common category of programming mistakes is defined by the misplacement of deallocation operations, also known as temporal memory mismanagement, which can generate two types of bugs: (1) use-after-free (UAF) bugs and (2) memory leaks. The former are severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise software availability and reliability. In the case of UAF bugs, existing solutions that almost exclusively rely on dynamic analysis suffer from limitations, including low code coverage, binary incompatibility, and high overheads. In the case of memory leaks, detection techniques are abundant; however, fixing techniques have been poorly investigated. In this thesis, we present three novel program analysis frameworks to address temporal memory mismanagement in C/C++. First, we introduce Tac, the first static UAF detection framework to combine typestate analysis with machine learning. Tac identifies representative features to train a Support Vector Machine to classify likely true/false UAF candidates, thereby providing guidance for typestate analysis used to locate bugs with precision. We then present CRed, a pointer analysis-based framework for UAF detection with a novel context-reduction technique and a new demand-driven path-sensitive pointer analysis to boost scalability and precision. A major advantage of CRed is its ability to substantially and soundly reduce search space without losing bug-finding ability. This is achieved by utilizing must-not-alias information to truncate unnecessary segments of calling contexts. Finally, we propose AutoFix, an automated memory leak fixing framework based on value-flow analysis and static instrumentation that can fix all leaks reported by any front-end detector with negligible overheads safely and with precision. AutoFix tolerates false leaks with a shadow memory data structure carefully designed to keep track of the allocation and deallocation of potentially leaked memory objects. The contribution of this thesis is threefold. First, we advance existing state-of-the-art solutions to detecting memory leaks by proposing a series of novel program analysis techniques to address temporal memory mismanagement. Second, corresponding prototype tools are fully implemented in the LLVM compiler framework. Third, an extensive evaluation of open-source C/C++ benchmarks is conducted to validate the effectiveness of the proposed techniques

    Program Model Checking: A Practitioner's Guide

    Get PDF
    Program model checking is a verification technology that uses state-space exploration to evaluate large numbers of potential program executions. Program model checking provides improved coverage over testing by systematically evaluating all possible test inputs and all possible interleavings of threads in a multithreaded system. Model-checking algorithms use several classes of optimizations to reduce the time and memory requirements for analysis, as well as heuristics for meaningful analysis of partial areas of the state space Our goal in this guidebook is to assemble, distill, and demonstrate emerging best practices for applying program model checking. We offer it as a starting point and introduction for those who want to apply model checking to software verification and validation. The guidebook will not discuss any specific tool in great detail, but we provide references for specific tools
    corecore