4 research outputs found
Differentially Testing Soundness and Precision of Program Analyzers
In the last decades, numerous program analyzers have been developed both by
academia and industry. Despite their abundance however, there is currently no
systematic way of comparing the effectiveness of different analyzers on
arbitrary code. In this paper, we present the first automated technique for
differentially testing soundness and precision of program analyzers. We used
our technique to compare six mature, state-of-the art analyzers on tens of
thousands of automatically generated benchmarks. Our technique detected
soundness and precision issues in most analyzers, and we evaluated the
implications of these issues to both designers and users of program analyzers
Higher-Order, Data-Parallel Structured Deduction
State-of-the-art Datalog engines include expressive features such as ADTs
(structured heap values), stratified aggregation and negation, various
primitive operations, and the opportunity for further extension using FFIs.
Current parallelization approaches for state-of-art Datalogs target
shared-memory locking data-structures using conventional multi-threading, or
use the map-reduce model for distributed computing. Furthermore, current
state-of-art approaches cannot scale to formal systems which pervasively
manipulate structured data due to their lack of indexing for structured data
stored in the heap.
In this paper, we describe a new approach to data-parallel structured
deduction that involves a key semantic extension of Datalog to permit
first-class facts and higher-order relations via defunctionalization, an
implementation approach that enables parallelism uniformly both across sets of
disjoint facts and over individual facts with nested structure. We detail a
core language, , whose key invariant (subfact closure) ensures that each
subfact is materialized as a top-class fact. We extend to Slog, a
fully-featured language whose forms facilitate leveraging subfact closure to
rapidly implement expressive, high-performance formal systems. We demonstrate
Slog by building a family of control-flow analyses from abstract machines,
systematically, along with several implementations of classical type systems
(such as STLC and LF). We performed experiments on EC2, Azure, and ALCF's Theta
at up to 1000 threads, showing orders-of-magnitude scalability improvements
versus competing state-of-art systems
Deconstructing Datalog
The deductive query language Datalog has found a wide array of uses, including static analy- sis (Smaragdakis and Bravenboer, 2010), business analytics (Aref et al., 2015), and distributed programming (Alvaro et al., 2010, 2011). Datalog is high-level and declarative, but simple and well-studied enough to admit efficient implementation strategies. For example, Whaley et al. found they could replace a hand-tuned C implementation of context-sensitive pointer analysis with a comparably-performing Datalog program that was 100x smaller (Whaley and Lam, 2004; Whaley et al., 2005).
However, Datalog’s semantics are not stable under extensions. For instance, adding arithmetic operations breaks Datalog’s termination guarantee. Despite this, nearly all practical implementations extend Datalog beyond its theoretical core to add niceties such as arithmetic, datatypes, aggregations, and so on. Moreover, pure Datalog cannot abstract over repeated code: one may express a static analysis over a particular program, but to express the same analysis over multiple programs, one must duplicate the analysis code for each program analyzed.
This thesis deconstructs Datalog from a categorical and type theoretic perspective to determine what makes it tick. Datalog’s semantic guarantees are provided by brute syntactic restrictions, such as stratification and the absence of function symbols. In place of these, we find compositional semantic properties such as monotonicity, which we capture using types. We show that this permits integrating Datalog’s features with those of typed functional languages, such as algebraic data types and higher order functions. In particular, this thesis makes the following contributions:
1. We define and expound the semantics and metatheory of Datafun, a pure and total higher-order typed functional language capturing the essence of Datalog. Where Data- log has predicates defined by a restricted class of Horn clauses, Datafun has finite sets and set comprehensions; Datalog’s bottom-up recursive queries become iterative fixed points; and Datalog’s stratification condition becomes a matter of tracking monotonicity with types.
2. We show how to generalize seminaïve evaluation to handle higher-order functions. Seminaïve evaluation is a technique from the Datalog literature which improves the performance of Datalog’s most distinctive feature: recursive queries. These are com- puted iteratively, and under a naïve evaluation strategy, each iteration recomputes all previous values. Seminaïve evaluation computes a safe approximation of the difference between iterations. This can asymptotically improve the performance of Datalog queries. Seminaïve evaluation is defined partly as a program transformation and partly as a modified iteration strategy, and takes advantage of the first-order nature of Datalog. We extend this transformation to handle higher-order programs written in Datafun.
3. In the process of generalizing seminaĂŻve evaluation, we uncover a theory of incremental, monotone, higher-order computation, in which values change over time by growing larger, and programs respond incrementally to these increases