676 research outputs found
Correctness Witness Validation by Abstract Interpretation
Witnesses record automated program analysis results and make them
exchangeable. To validate correctness witnesses through abstract
interpretation, we introduce a novel abstract operation unassume. This operator
incorporates witness invariants into the abstract program state. Given suitable
invariants, the unassume operation can accelerate fixpoint convergence and
yield more precise results. We demonstrate the feasibility of this approach by
augmenting an abstract interpreter with unassume operators and evaluating the
impact of incorporating witnesses on performance and precision. Using manually
crafted witnesses, we can confirm verification results for multi-threaded
programs with a reduction in effort ranging from 7% to 47% in CPU time. More
intriguingly, we discover that using witnesses from model checkers can guide
our analyzer to verify program properties that it could not verify on its own.Comment: 29 pages, 4 figures, 2 tables, extended version of the paper which is
to appear at VMCAI 202
Decidability and Synthesis of Abstract Inductive Invariants
Decidability and synthesis of inductive invariants ranging in a given domain
play an important role in many software and hardware verification systems. We
consider here inductive invariants belonging to an abstract domain as
defined in abstract interpretation, namely, ensuring the existence of the best
approximation in of any system property. In this setting, we study the
decidability of the existence of abstract inductive invariants in of
transition systems and their corresponding algorithmic synthesis. Our model
relies on some general results which relate the existence of abstract inductive
invariants with least fixed points of best correct approximations in of the
transfer functions of transition systems and their completeness properties.
This approach allows us to derive decidability and synthesis results for
abstract inductive invariants which are applied to the well-known Kildall's
constant propagation and Karr's affine equalities abstract domains. Moreover,
we show that a recent general algorithm for synthesizing inductive invariants
in domains of logical formulae can be systematically derived from our results
and generalized to a range of algorithms for computing abstract inductive
invariants
Approximations in Learning & Program Analysis
In this work we compare and contrast the approximations made in the problems of Data Compression, Program Analysis and Supervised Machine Learning. G\uf6del\u2019s Incompleteness Theorem mandates that any formal system rich enough to include integers will have unprovable truths. Thus non computable problems abound, including, but not limited to, Program Analysis, Data Compression and Machine Learning. Indeed, it can be shown that there are more non-computable functions than computable. Due to non- computability, precise solutions for these problems are not feasible, and only approximate solutions may be computed. Presently, each of these problems of Data Compression, Machine Learning and Program Analysis is studied independently. Each problem has it\u2019s own multitude of abstractions, algorithms and notions of tradeoffs among the various parameters. It would be interesting to have a unified framework, across disciplines, that makes explicit the abstraction specifications and ensuing tradeoffs. Such a framework would promote inter-disciplinary research and develop a unified body of knowledge to tackle non-computable problems. As a small step to that larger goal, we propose an Information Oriented Model of Computation that allows comparing the approximations used in Data Compression, Program Analysis and Machine Learning. To the best of our knowledge, this is the first work to propose a method for systematic comparison of approximations across disciplines. The model describes computation as set reconstruction. Non-computability is then presented as inability to perfectly reconstruct sets. In an effort to compare and contrast the approximations, select algorithms for Data Compression, Machine Learning and Program Analysis are analyzed using our model. We were able to relate the problems of Data Compression, Machine Learning and Program Analysis as specific instances of the general problem of approximate set reconstruction. We demonstrate the use of abstract interpreters in compression schemes. We then compare and contrast the approximations in Program Analysis and Supervised Machine Learning. We demonstrate the use of ordered structures, fixpoint equations and least fixpoint approximation computations, all characteristic of Abstract Interpretation (Program Analysis) in Machine Learning algorithms. We also present the idea that widening, like regression, is an inductive learner. Regression generalizes known states to a hypothesis. Widening generalizes abstract states on a iteration chain to a fixpoint. While Regression usually aims to minimize the total error (sum of false positives and false negatives), Widening aims for soundness and hence errs on the side of false positives to have zero false negatives. We use this duality to derive a generic widening operator from regression on the set of abstract states. The results of the dissertation are the first steps towards a unified approach to approximate computation. Consequently, our preliminary results lead to a lot more interesting questions, some of which we have tried to discuss in the concluding chapter
Graph-Based Shape Analysis Beyond Context-Freeness
We develop a shape analysis for reasoning about relational properties of data
structures. Both the concrete and the abstract domain are represented by
hypergraphs. The analysis is parameterized by user-supplied indexed graph
grammars to guide concretization and abstraction. This novel extension of
context-free graph grammars is powerful enough to model complex data structures
such as balanced binary trees with parent pointers, while preserving most
desirable properties of context-free graph grammars. One strength of our
analysis is that no artifacts apart from grammars are required from the user;
it thus offers a high degree of automation. We implemented our analysis and
successfully applied it to various programs manipulating AVL trees,
(doubly-linked) lists, and combinations of both
Mapping programs to equations
Extracting the function of a program from a static analysis of its source code is a valuable capability in software engineering; at a time when there is increasing talk of using AI (Artificial Intelligence) to generate software from natural language specifications, it becomes increasingly important to determine the exact function of software as written, to figure out what AI has understood the natural language specification to mean. For all its criticality, the ability to derive the domain-to-range function of a program has proved to be an elusive goal, due primarily to the difficulty of deriving the function of iterative statements. Several automated tools obviate this difficulty by unrolling the loops; but this is clearly an imperfect solution, especially in light of the fact that loops capture most of the computing power of a program, are the locus of most of its complexity, and the source of most of its faults. This dissertation investigates a three-step process to map a program written in a C-like language into a function from inputs to outputs, or from initial states to final states. The semantics of iterative statements are captured (while loops, repeat loops, for loops), including nested iterative statements, by means of the concept of invariant relation; an invariant relation is a reflexive transitive relation that links program states separated by an arbitrary number of iterations.
But the function derived for large and complex programs may be too unwieldy to be useful, not unlike drinking from a fire hose. In order to enable the user to query the program at scale, four functions are proposed. We propose four functions: Assume(), which enables the user to make assumptions about program states or program parts; Capture(), which enables the user to capture the state of the program at some label of the function of some program part; Verify(), which enables the user to verify a unary assertion about the state of the program at some label, or a binary assertion about a program part; and Establish(), which is envisioned to use program repair techniques to modify the program so as to make a Verify() query return true
A Review of Formal Methods applied to Machine Learning
We review state-of-the-art formal methods applied to the emerging field of
the verification of machine learning systems. Formal methods can provide
rigorous correctness guarantees on hardware and software systems. Thanks to the
availability of mature tools, their use is well established in the industry,
and in particular to check safety-critical applications as they undergo a
stringent certification process. As machine learning is becoming more popular,
machine-learned components are now considered for inclusion in critical
systems. This raises the question of their safety and their verification. Yet,
established formal methods are limited to classic, i.e. non machine-learned
software. Applying formal methods to verify systems that include machine
learning has only been considered recently and poses novel challenges in
soundness, precision, and scalability.
We first recall established formal methods and their current use in an
exemplar safety-critical field, avionic software, with a focus on abstract
interpretation based techniques as they provide a high level of scalability.
This provides a golden standard and sets high expectations for machine learning
verification. We then provide a comprehensive and detailed review of the formal
methods developed so far for machine learning, highlighting their strengths and
limitations. The large majority of them verify trained neural networks and
employ either SMT, optimization, or abstract interpretation techniques. We also
discuss methods for support vector machines and decision tree ensembles, as
well as methods targeting training and data preparation, which are critical but
often neglected aspects of machine learning. Finally, we offer perspectives for
future research directions towards the formal verification of machine learning
systems
Transfer Function Synthesis without Quantifier Elimination
Traditionally, transfer functions have been designed manually for each
operation in a program, instruction by instruction. In such a setting, a
transfer function describes the semantics of a single instruction, detailing
how a given abstract input state is mapped to an abstract output state. The net
effect of a sequence of instructions, a basic block, can then be calculated by
composing the transfer functions of the constituent instructions. However,
precision can be improved by applying a single transfer function that captures
the semantics of the block as a whole. Since blocks are program-dependent, this
approach necessitates automation. There has thus been growing interest in
computing transfer functions automatically, most notably using techniques based
on quantifier elimination. Although conceptually elegant, quantifier
elimination inevitably induces a computational bottleneck, which limits the
applicability of these methods to small blocks. This paper contributes a method
for calculating transfer functions that finesses quantifier elimination
altogether, and can thus be seen as a response to this problem. The
practicality of the method is demonstrated by generating transfer functions for
input and output states that are described by linear template constraints,
which include intervals and octagons.Comment: 37 pages, extended version of ESOP 2011 pape
Static Analysis of Run-Time Errors in Embedded Real-Time Parallel C Programs
We present a static analysis by Abstract Interpretation to check for run-time
errors in parallel and multi-threaded C programs. Following our work on
Astr\'ee, we focus on embedded critical programs without recursion nor dynamic
memory allocation, but extend the analysis to a static set of threads
communicating implicitly through a shared memory and explicitly using a finite
set of mutual exclusion locks, and scheduled according to a real-time
scheduling policy and fixed priorities. Our method is thread-modular. It is
based on a slightly modified non-parallel analysis that, when analyzing a
thread, applies and enriches an abstract set of thread interferences. An
iterator then re-analyzes each thread in turn until interferences stabilize. We
prove the soundness of our method with respect to the sequential consistency
semantics, but also with respect to a reasonable weakly consistent memory
semantics. We also show how to take into account mutual exclusion and thread
priorities through a partitioning over an abstraction of the scheduler state.
We present preliminary experimental results analyzing an industrial program
with our prototype, Th\'es\'ee, and demonstrate the scalability of our
approach
- âŠ