12 research outputs found

    Dynamically diagnosing type errors in unsafe code

    Get PDF
    Existing approaches for detecting type errors in unsafe languages are limited. Static analysis methods are imprecise, and often require source-level changes, while most dynamic methods check only memory properties (bounds, liveness, etc.), owing to a lack of run-time type information. This paper describes libcrunch, a system for binary-compatible run-time type checking of unmodified unsafe code, currently focusing on C. Practical experience shows that our prototype implementation is easily applicable to many real codebases without source-level modification, correctly flags programmer errors with a very low rate of false positives, offers a very low run-time overhead, and covers classes of error caught by no previously existing tool

    Runtime Instrumentation for Precise Flow-Sensitive Type Analysis

    Full text link

    Runtime Instrumentation for Precise Flow-Sensitive Type Analysis

    Get PDF
    We describe a combination of runtime information and static analysis for checking properties of complex and configurable systems. The basic idea of our approach is to 1) let the program execute and thereby read the important dynamic configuration data, then 2) invoke static analysis from this runtime state to detect possible errors that can happen in the continued execution. This approach improves analysis precision, particularly with respect to types of global variables and nested data structures. It also enables the resolution of modules that are loaded based on dynamically computed information. We describe an implementation of this approach in a tool that statically computes possible types of variables in PHP applications, including detailed types of nested maps (arrays). PHP is a dynamically typed language; PHP programs extensively use nested value maps, as well as ’include’ directives whose arguments are dynamically computed file names. We have applied our analysis tool to over 50’000 lines of PHP code, including the popular DokuWiki software, which has a plug-in architecture. The analysis identified 200 problems in the code and in the type hints of the original source code base. Some of these problems can cause exploits, infinite loops, and crashes. Our experiments show that dynamic information simplifies the development of the analysis and decreases the number of false alarms compared to a purely static analysis approach

    On Using Static Analysis to Detect Type Errors in PHP Applications

    Get PDF
    We describe our experience in using abstract interpretation to analyze applications written in PHP. Our work focuses on reconstructing type information from mostly unannotated code. We present the abstract domain of our analysis, focusing on the features that improve analysis precision. We have implemented our approach as a tool that supports the full specification of PHP 5. We describe several bugs that we were able to find in deployed web applications

    Runtime Enforcement of Memory Safety for the C Programming Language

    Get PDF
    Memory access violations are a leading source of unreliability in C programs. Although the low-level features of the C programming language, like unchecked pointer arithmetic and explicit memory management, make it a desirable language for many programming tasks, their use often results in hard-to-detect memory errors. As evidence of this problem, a variety of methods exist for retrofitting C with software checks to detect memory errors at runtime. However, these techniques generally suffer from one or more practical drawbacks that have thus far limited their adoption. These weaknesses include the inability to detect all spatial and temporal violations, the use of incompatible metadata, the need for manual code modifications, and the tremendous runtime cost of providing complete safety. This dissertation introduces MemSafe, a compiler analysis and transformation for ensuring the memory safety of C programs at runtime while avoiding the above drawbacks. MemSafe makes several novel contributions that improve upon previous work and lower the runtime cost of achieving memory safety. These include (1) a method for modeling temporal errors as spatial errors, (2) a hybrid metadata representation that combines the most salient features of both object- and pointer-based approaches, and (3) a data-flow representation that simplifies optimizations for removing unneeded checks and unused metadata. Experimental results indicate that MemSafe is capable of detecting memory safety violations in real-world programs with lower runtime overhead than previous methods. Results show that MemSafe detects all known memory errors in multiple versions of two large and widely-used open source applications as well as six programs from a benchmark suite specifically designed for the evaluation of error detection tools. MemSafe enforces complete safety with an average overhead of 88% on 30 widely-used performance evaluation benchmarks. In comparison with previous work, MemSafe's average runtime overhead for one common benchmark suite (29%) is a fraction of that associated with the previous technique (133%) that, until now, had the lowest overhead among all existing complete and automatic methods that are capable of detecting both spatial and temporal violations

    Abstraction-level functional programming

    Get PDF

    Jiapi: A Type Checker Generator for Statically Typed Languages

    Full text link
    Type systems are a key characteristic in the context of the study of programming languages. They frequently offer a simple, intuitive way of expressing and testing the fundamental structure of programs. This is especially true when types are used to provide formal, machine-checked documentation for an implementation. For example, the absence of type errors in code prior to execution is what type systems for static programming languages are designed to assure, and in the literature, type systems that satisfy this requirement are referred to as sound type systems. Types also define module interfaces, making them essential for achieving and maintaining consistency in large software systems. On these accounts, type systems can enable early detection of program errors and vastly improve the process of understanding unfamiliar code. However, even for verification professionals, creating mechanized type soundness proofs using the tools and procedures readily available today is a difficult undertaking. Given this result, it is only logical to wonder if we can capture and check more of the program structure, so the next two questions concern type checking. Once a type system has been specified, how can we implement a type checker for the language? Another common question concerns the operational semantics, that is, once the type system is specified, how can we take advantage of this specification to obtain a type checker for the language? In each of these cases, type systems are often too complex to allow for a practical and reasonable approach to specifying many common language features in a natural fashion; therefore, the crux here is: What is the bare minimum set of constructs required to write a type checker specification for any domain-specific language (DSL)? We believe that sets (including tests for memberships, subsets, size, and other properties) as well as first-order logic and an expression grammar can be used to do this. In addition, we will need the ability to apply some form of filter on sets, as well as possibly a reduction and map. The primary goal of this thesis is to raise the degree of automation for type checking, so we present Jiapi, an automatic type checker generation tool. The tool generates a type checker based on a description of a language’s type system expressed in set notation and first-order logic, allowing types to be interpreted as propositions and values to be interpreted as proofs of these propositions, which we can then reduce to a small number of easily-reviewable predicates. Consequently, we can have a completely spelled-out model of a language’s type system that is free of inconsistencies or ambiguities, and we can mathematically demonstrate features of the language and programs written in it. We use two languages as case studies: a somewhat large and more general, and a more domain-specific. The first one is a true subset of Java (an object-oriented language) called Espresso, and the second one is a pedagogical language like C/C++ called C-Minor. Also, there is a declarative type system description extension for the first language, and parts of the specification for the second one. Finally, our approach is evaluated to demonstrate that the generated type checker can check automatically the full functional correctness of an Espresso and C-Minor program

    SOUND, PRECISE AND EFFICIENT STATIC RACE DETECTION FOR MULTI-THREADED PROGRAMS

    Get PDF
    Multithreaded programming is increasingly relevant due to the growing prevalence of multi-core processors. Unfortunately, the non-determinism in parallel processing makes it easy to make mistakes but difficult to detect them, so that writing concurrent programs is considered very difficult. A data race, which happens when two threads access the same memory location without synchronization is a common concurrency error, with potentially disastrous consequences. This dissertation presents Locksmith, a tool for automatically finding data races in multi-threaded C programs by analyzing their source code. Locksmith uses a collection of static analysis techniques to reason about program properties, including a novel effect system to compute memory locations that are shared between threads, a system for inferring ``guarded-by'' correlations between locks and memory locations, and a novel analysis of data structures using existential types. We present the main analyses in detail and give formal proofs to support their soundness. We discuss the implementation of each analysis in Locksmith, present the problems that arose when extending it to the full C programming language, and discuss some alternative solutions. We provide extensive measurements for the precision and performance of each analysis and compare alternative techniques to find the best combination
    corecore