564 research outputs found
Faster Algorithms for Weighted Recursive State Machines
Pushdown systems (PDSs) and recursive state machines (RSMs), which are
linearly equivalent, are standard models for interprocedural analysis. Yet RSMs
are more convenient as they (a) explicitly model function calls and returns,
and (b) specify many natural parameters for algorithmic analysis, e.g., the
number of entries and exits. We consider a general framework where RSM
transitions are labeled from a semiring and path properties are algebraic with
semiring operations, which can model, e.g., interprocedural reachability and
dataflow analysis problems.
Our main contributions are new algorithms for several fundamental problems.
As compared to a direct translation of RSMs to PDSs and the best-known existing
bounds of PDSs, our analysis algorithm improves the complexity for
finite-height semirings (that subsumes reachability and standard dataflow
properties). We further consider the problem of extracting distance values from
the representation structures computed by our algorithm, and give efficient
algorithms that distinguish the complexity of a one-time preprocessing from the
complexity of each individual query. Another advantage of our algorithm is that
our improvements carry over to the concurrent setting, where we improve the
best-known complexity for the context-bounded analysis of concurrent RSMs.
Finally, we provide a prototype implementation that gives a significant
speed-up on several benchmarks from the SLAM/SDV project
Heap Abstractions for Static Analysis
Heap data is potentially unbounded and seemingly arbitrary. As a consequence,
unlike stack and static memory, heap memory cannot be abstracted directly in
terms of a fixed set of source variable names appearing in the program being
analysed. This makes it an interesting topic of study and there is an abundance
of literature employing heap abstractions. Although most studies have addressed
similar concerns, their formulations and formalisms often seem dissimilar and
some times even unrelated. Thus, the insights gained in one description of heap
abstraction may not directly carry over to some other description. This survey
is a result of our quest for a unifying theme in the existing descriptions of
heap abstractions. In particular, our interest lies in the abstractions and not
in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of
two features: a heap model to represent the heap memory and a summarization
technique for bounding the heap representation. We classify the models as
storeless, store based, and hybrid. We describe various summarization
techniques based on k-limiting, allocation sites, patterns, variables, other
generic instrumentation predicates, and higher-order logics. This approach
allows us to compare the insights of a large number of seemingly dissimilar
heap abstractions and also paves way for creating new abstractions by
mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure
Slicing of Concurrent Programs and its Application to Information Flow Control
This thesis presents a practical technique for information flow control for concurrent programs with threads and shared-memory communication. The technique guarantees confidentiality of information with respect to a reasonable attacker model and utilizes program dependence
graphs (PDGs), a language-independent representation of information flow in a program
IST Austria Thesis
This dissertation focuses on algorithmic aspects of program verification, and presents modeling and complexity advances on several problems related to the
static analysis of programs, the stateless model checking of concurrent programs, and the competitive analysis of real-time scheduling algorithms.
Our contributions can be broadly grouped into five categories.
Our first contribution is a set of new algorithms and data structures for the quantitative and data-flow analysis of programs, based on the graph-theoretic notion of treewidth.
It has been observed that the control-flow graphs of typical programs have special structure, and are characterized as graphs of small treewidth.
We utilize this structural property to provide faster algorithms for the quantitative and data-flow analysis of recursive and concurrent programs.
In most cases we make an algebraic treatment of the considered problem,
where several interesting analyses, such as the reachability, shortest path, and certain kind of data-flow analysis problems follow as special cases.
We exploit the constant-treewidth property to obtain algorithmic improvements for on-demand versions of the problems,
and provide data structures with various tradeoffs between the resources spent in the preprocessing and querying phase.
We also improve on the algorithmic complexity of quantitative problems outside the algebraic path framework,
namely of the minimum mean-payoff, minimum ratio, and minimum initial credit for energy problems.
Our second contribution is a set of algorithms for Dyck reachability with applications to data-dependence analysis and alias analysis.
In particular, we develop an optimal algorithm for Dyck reachability on bidirected graphs, which are ubiquitous in context-insensitive, field-sensitive points-to analysis.
Additionally, we develop an efficient algorithm for context-sensitive data-dependence analysis via Dyck reachability,
where the task is to obtain analysis summaries of library code in the presence of callbacks.
Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is (i)~linear in the number of call sites and (ii)~only logarithmic in the size of the whole library, as opposed to linear in the size of the whole library.
Finally, we prove that Dyck reachability is Boolean Matrix Multiplication-hard in general, and the hardness also holds for graphs of constant treewidth.
This hardness result strongly indicates that there exist no combinatorial algorithms for Dyck reachability with truly subcubic complexity.
Our third contribution is the formalization and algorithmic treatment of the Quantitative Interprocedural Analysis framework.
In this framework, the transitions of a recursive program are annotated as good, bad or neutral, and receive a weight which measures
the magnitude of their respective effect.
The Quantitative Interprocedural Analysis problem asks to determine whether there exists an infinite run of the program where the long-run ratio of the bad weights over the good weights is above a given threshold.
We illustrate how several quantitative problems related to static analysis of recursive programs can be instantiated in this framework,
and present some case studies to this direction.
Our fourth contribution is a new dynamic partial-order reduction for the stateless model checking of concurrent programs. Traditional approaches rely on the standard Mazurkiewicz equivalence between traces, by means of partitioning the trace space into equivalence classes, and attempting to explore a few representatives from each class.
We present a new dynamic partial-order reduction method called the Data-centric Partial Order Reduction (DC-DPOR).
Our algorithm is based on a new equivalence between traces, called the observation equivalence.
DC-DPOR explores a coarser partitioning of the trace space than any exploration method based on the standard Mazurkiewicz equivalence.
Depending on the program, the new partitioning can be even exponentially coarser.
Additionally, DC-DPOR spends only polynomial time in each explored class.
Our fifth contribution is the use of automata and game-theoretic verification techniques in the competitive analysis and synthesis of real-time scheduling algorithms for firm-deadline tasks.
On the analysis side, we leverage automata on infinite words to compute the competitive ratio of real-time schedulers subject to various environmental constraints.
On the synthesis side, we introduce a new instance of two-player mean-payoff partial-information games, and show
how the synthesis of an optimal real-time scheduler can be reduced to computing winning strategies in this new type of games
A combined representation for the maintenance of C programs
A programmer wishing to make a change to a piece of code must first gain a full understanding of the behaviours and functionality involved. This process of program comprehension is difficult and time consuming, and often hindered by the absence of useful program documentation. Where documentation is absent, static analysis techniques are often employed to gather programming level information in the form of data and control flow relationships, directly from the source code itself. Software maintenance environments are created by grouping together a number of different static analysis tools such as program sheers, call graph builders and data flow analysis tools, providing a maintainer with a selection of 'views' of the subject code. However, each analysis tool often requires its own intermediate program representation (IPR). For example, an environment comprising five tools may require five different IPRs, giving repetition of information and inefficient use of storage space. A solution to this problem is to develop a single combined representation which contains all the program relationships required to present a maintainer with each required code view. The research presented in this thesis describes the Combined C Graph (CCG), a dependence-based representation for C programs from which a maintainer is able to construct data and control dependence views, interprocedural control flow views, program slices and ripple analyses. The CCG extends earlier dependence-based program representations, introducing language features such as expressions with embedded side effects and control flows, value returning functions, pointer variables, pointer parameters, array variables and structure variables. Algorithms for the construction of the CCG are described and the feasibility of the CCG demonstrated by means of a C/Prolog based prototype implementation
BCFA: Bespoke Control Flow Analysis for CFA at Scale
Many data-driven software engineering tasks such as discovering programming
patterns, mining API specifications, etc., perform source code analysis over
control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be
expensive and performance of the analysis heavily depends on the underlying CFG
traversal strategy. State-of-the-art analysis frameworks use a fixed traversal
strategy. We argue that a single traversal strategy does not fit all kinds of
analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a
control flow analysis (CFA) and a large number of CFGs, BCFA selects the most
efficient traversal strategy for each CFG. BCFA extracts a set of properties of
the CFA by analyzing the code of the CFA and combines it with properties of the
CFG, such as branching factor and cyclicity, for selecting the optimal
traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a
set of representative static analyses that mainly involve traversing CFGs and
two large datasets containing 287 thousand and 162 million CFGs. Our results
show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA
has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page
- …