26 research outputs found

    On the Practice and Application of Context-Free Language Reachability

    Get PDF
    The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques

    Optimal Dyck reachability for data-dependence and Alias analysis

    Get PDF
    A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows: First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with n nodes and m edges, we present: (i) an algorithm with worst-case running time O(m + n 路 伪(n)), where 伪(n) is the inverse Ackermann function, improving the previously known O(n2) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m) time, improving the previously known O(m 路 logn) bound. Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites. Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth. Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks

    IST Austria Technical Report

    Get PDF
    A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graphwhere the edges are labeled with different types of opening and closing parentheses, and the reachabilityinformation is computed via paths whose parentheses are properly matched. We present new results for Dyckreachability problems with applications to alias analysis and data-dependence analysis. Our main contributions,that include improved upper bounds as well as lower bounds that establish optimality guarantees, are asfollows:First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph withnnodes andmedges, we present: (i) an algorithmwith worst-case running timeO(m+n路伪(n)), where伪(n)is the inverse Ackermann function, improving thepreviously knownO(n2)time bound; (ii) a matching lower bound that shows that our algorithm is optimalwrt to worst-case complexity; and (iii) an optimal average-case upper bound ofO(m)time, improving thepreviously knownO(m路logn)bound.Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtainanalysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almostlinear time, after which the contribution of the library in the complexity of the client analysis is only linear,and only wrt the number of call sites.Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean MatrixMultiplication, which is a long-standing open problem. Thus we establish that the existing combinatorialalgorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the samehardness holds for graphs of constant treewidth.Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependenceanalysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform allexisting methods on the two problems, over real-world benchmarks

    Introspective Pushdown Analysis of Higher-Order Programs

    Full text link
    In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection skirt just inside the boundaries of soundness and decidability. Alone, each method reduces analysis times and boosts precision by orders of magnitude. This work illuminates and conquers the theoretical challenges that stand in the way of combining the power of these techniques. The challenge in marrying these techniques is not subtle: computing the reachable control states of a pushdown system relies on limiting access during transition to the top of the stack; abstract garbage collection, on the other hand, needs full access to the entire stack to compute a root set, just as concrete collection does. \emph{Introspective} pushdown systems resolve this conflict. Introspective pushdown systems provide enough access to the stack to allow abstract garbage collection, but they remain restricted enough to compute control-state reachability, thereby enabling the sound and precise product of pushdown analysis and abstract garbage collection. Experiments reveal synergistic interplay between the techniques, and the fusion demonstrates "better-than-both-worlds" precision.Comment: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, 2012, AC

    IST Austria Thesis

    Get PDF
    This dissertation focuses on algorithmic aspects of program verification, and presents modeling and complexity advances on several problems related to the static analysis of programs, the stateless model checking of concurrent programs, and the competitive analysis of real-time scheduling algorithms. Our contributions can be broadly grouped into five categories. Our first contribution is a set of new algorithms and data structures for the quantitative and data-flow analysis of programs, based on the graph-theoretic notion of treewidth. It has been observed that the control-flow graphs of typical programs have special structure, and are characterized as graphs of small treewidth. We utilize this structural property to provide faster algorithms for the quantitative and data-flow analysis of recursive and concurrent programs. In most cases we make an algebraic treatment of the considered problem, where several interesting analyses, such as the reachability, shortest path, and certain kind of data-flow analysis problems follow as special cases. We exploit the constant-treewidth property to obtain algorithmic improvements for on-demand versions of the problems, and provide data structures with various tradeoffs between the resources spent in the preprocessing and querying phase. We also improve on the algorithmic complexity of quantitative problems outside the algebraic path framework, namely of the minimum mean-payoff, minimum ratio, and minimum initial credit for energy problems. Our second contribution is a set of algorithms for Dyck reachability with applications to data-dependence analysis and alias analysis. In particular, we develop an optimal algorithm for Dyck reachability on bidirected graphs, which are ubiquitous in context-insensitive, field-sensitive points-to analysis. Additionally, we develop an efficient algorithm for context-sensitive data-dependence analysis via Dyck reachability, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is (i)~linear in the number of call sites and (ii)~only logarithmic in the size of the whole library, as opposed to linear in the size of the whole library. Finally, we prove that Dyck reachability is Boolean Matrix Multiplication-hard in general, and the hardness also holds for graphs of constant treewidth. This hardness result strongly indicates that there exist no combinatorial algorithms for Dyck reachability with truly subcubic complexity. Our third contribution is the formalization and algorithmic treatment of the Quantitative Interprocedural Analysis framework. In this framework, the transitions of a recursive program are annotated as good, bad or neutral, and receive a weight which measures the magnitude of their respective effect. The Quantitative Interprocedural Analysis problem asks to determine whether there exists an infinite run of the program where the long-run ratio of the bad weights over the good weights is above a given threshold. We illustrate how several quantitative problems related to static analysis of recursive programs can be instantiated in this framework, and present some case studies to this direction. Our fourth contribution is a new dynamic partial-order reduction for the stateless model checking of concurrent programs. Traditional approaches rely on the standard Mazurkiewicz equivalence between traces, by means of partitioning the trace space into equivalence classes, and attempting to explore a few representatives from each class. We present a new dynamic partial-order reduction method called the Data-centric Partial Order Reduction (DC-DPOR). Our algorithm is based on a new equivalence between traces, called the observation equivalence. DC-DPOR explores a coarser partitioning of the trace space than any exploration method based on the standard Mazurkiewicz equivalence. Depending on the program, the new partitioning can be even exponentially coarser. Additionally, DC-DPOR spends only polynomial time in each explored class. Our fifth contribution is the use of automata and game-theoretic verification techniques in the competitive analysis and synthesis of real-time scheduling algorithms for firm-deadline tasks. On the analysis side, we leverage automata on infinite words to compute the competitive ratio of real-time schedulers subject to various environmental constraints. On the synthesis side, we introduce a new instance of two-player mean-payoff partial-information games, and show how the synthesis of an optimal real-time scheduler can be reduced to computing winning strategies in this new type of games

    Doctor of Philosophy

    Get PDF
    dissertationIn the static analysis of functional programs, control- ow analysis (k-CFA) is a classic method of approximating program behavior as a infinite state automata. CFA2 and abstract garbage collection are two recent, yet orthogonal improvements, on k-CFA. CFA2 approximates program behavior as a pushdown system, using summarization for the stack. CFA2 can accurately approximate arbitrarily-deep recursive function calls, whereas k-CFA cannot. Abstract garbage collection removes unreachable values from the store/heap. If unreachable values are not removed from a static analysis, they can become reachable again, which pollutes the final analysis and makes it less precise. Unfortunately, as these two techniques were originally formulated, they are incompatible. CFA2's summarization technique for managing the stack obscures the stack such that abstract garbage collection is unable to examine the stack for reachable values. This dissertation presents introspective pushdown control-flow analysis, which manages the stack explicitly through stack changes (pushes and pops). Because this analysis is able to examine the stack by how it has changed, abstract garbage collection is able to examine the stack for reachable values. Thus, introspective pushdown control-flow analysis merges successfully the benefits of CFA2 and abstract garbage collection to create a more precise static analysis. Additionally, the high-performance computing community has viewed functional programming techniques and tools as lacking the efficiency necessary for their applications. Nebo is a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena. For efficient execution, Nebo exploits a version of expression templates, based on the C++ template system, which is a type-less, completely-pure, Turing-complete functional language with burdensome syntax. Nebo's declarative syntax supports functional tools, such as point-wise lifting of complex expressions and functional composition of stencil operators. Nebo's primary abstraction is mathematical assignment, which separates what a calculation does from how that calculation is executed. Currently Nebo supports single-core execution, multicore (thread-based) parallel execution, and GPU execution. With single-core execution, Nebo performs on par with the loops and code that it replaces in Wasatch, a pre-existing high-performance simulation project. With multicore (thread-based) execution, Nebo can linearly scale (with roughly 90% efficiency) up to 6 processors, compared to its single-core execution. Moreover, Nebo's GPU execution can be up to 37x faster than its single-core execution. Finally, Wasatch (the pre-existing high-performance simulation project which uses Nebo) can scale up to 262K cores
    corecore