4 research outputs found

    Differentially Testing Soundness and Precision of Program Analyzers

    Full text link
    In the last decades, numerous program analyzers have been developed both by academia and industry. Despite their abundance however, there is currently no systematic way of comparing the effectiveness of different analyzers on arbitrary code. In this paper, we present the first automated technique for differentially testing soundness and precision of program analyzers. We used our technique to compare six mature, state-of-the art analyzers on tens of thousands of automatically generated benchmarks. Our technique detected soundness and precision issues in most analyzers, and we evaluated the implications of these issues to both designers and users of program analyzers

    Higher-Order, Data-Parallel Structured Deduction

    Full text link
    State-of-the-art Datalog engines include expressive features such as ADTs (structured heap values), stratified aggregation and negation, various primitive operations, and the opportunity for further extension using FFIs. Current parallelization approaches for state-of-art Datalogs target shared-memory locking data-structures using conventional multi-threading, or use the map-reduce model for distributed computing. Furthermore, current state-of-art approaches cannot scale to formal systems which pervasively manipulate structured data due to their lack of indexing for structured data stored in the heap. In this paper, we describe a new approach to data-parallel structured deduction that involves a key semantic extension of Datalog to permit first-class facts and higher-order relations via defunctionalization, an implementation approach that enables parallelism uniformly both across sets of disjoint facts and over individual facts with nested structure. We detail a core language, DLsDL_s, whose key invariant (subfact closure) ensures that each subfact is materialized as a top-class fact. We extend DLsDL_s to Slog, a fully-featured language whose forms facilitate leveraging subfact closure to rapidly implement expressive, high-performance formal systems. We demonstrate Slog by building a family of control-flow analyses from abstract machines, systematically, along with several implementations of classical type systems (such as STLC and LF). We performed experiments on EC2, Azure, and ALCF's Theta at up to 1000 threads, showing orders-of-magnitude scalability improvements versus competing state-of-art systems

    Deconstructing Datalog

    Get PDF
    The deductive query language Datalog has found a wide array of uses, including static analy- sis (Smaragdakis and Bravenboer, 2010), business analytics (Aref et al., 2015), and distributed programming (Alvaro et al., 2010, 2011). Datalog is high-level and declarative, but simple and well-studied enough to admit efficient implementation strategies. For example, Whaley et al. found they could replace a hand-tuned C implementation of context-sensitive pointer analysis with a comparably-performing Datalog program that was 100x smaller (Whaley and Lam, 2004; Whaley et al., 2005). However, Datalog’s semantics are not stable under extensions. For instance, adding arithmetic operations breaks Datalog’s termination guarantee. Despite this, nearly all practical implementations extend Datalog beyond its theoretical core to add niceties such as arithmetic, datatypes, aggregations, and so on. Moreover, pure Datalog cannot abstract over repeated code: one may express a static analysis over a particular program, but to express the same analysis over multiple programs, one must duplicate the analysis code for each program analyzed. This thesis deconstructs Datalog from a categorical and type theoretic perspective to determine what makes it tick. Datalog’s semantic guarantees are provided by brute syntactic restrictions, such as stratification and the absence of function symbols. In place of these, we find compositional semantic properties such as monotonicity, which we capture using types. We show that this permits integrating Datalog’s features with those of typed functional languages, such as algebraic data types and higher order functions. In particular, this thesis makes the following contributions: 1. We define and expound the semantics and metatheory of Datafun, a pure and total higher-order typed functional language capturing the essence of Datalog. Where Data- log has predicates defined by a restricted class of Horn clauses, Datafun has finite sets and set comprehensions; Datalog’s bottom-up recursive queries become iterative fixed points; and Datalog’s stratification condition becomes a matter of tracking monotonicity with types. 2. We show how to generalize seminaïve evaluation to handle higher-order functions. Seminaïve evaluation is a technique from the Datalog literature which improves the performance of Datalog’s most distinctive feature: recursive queries. These are com- puted iteratively, and under a naïve evaluation strategy, each iteration recomputes all previous values. Seminaïve evaluation computes a safe approximation of the difference between iterations. This can asymptotically improve the performance of Datalog queries. Seminaïve evaluation is defined partly as a program transformation and partly as a modified iteration strategy, and takes advantage of the first-order nature of Datalog. We extend this transformation to handle higher-order programs written in Datafun. 3. In the process of generalizing seminaïve evaluation, we uncover a theory of incremental, monotone, higher-order computation, in which values change over time by growing larger, and programs respond incrementally to these increases

    Safe and sound program analysis with Flix

    No full text
    corecore