1,305 research outputs found
Enhancing Symbolic Execution of Heap-based Programs with Separation Logic for Test Input Generation
Symbolic execution is a well established method for test input generation.
Despite of having achieved tremendous success over numerical domains, existing
symbolic execution techniques for heap-based programs are limited due to the
lack of a succinct and precise description for symbolic values over unbounded
heaps. In this work, we present a new symbolic execution method for heap-based
programs based on separation logic. The essence of our proposal is
context-sensitive lazy initialization, a novel approach for efficient test
input generation. Our approach differs from existing approaches in two ways.
Firstly, our approach is based on separation logic, which allows us to
precisely capture preconditions of heap-based programs so that we avoid
generating invalid test inputs. Secondly, we generate only fully initialized
test inputs, which are more useful in practice compared to those partially
initialized test inputs generated by the state-of-the-art tools. We have
implemented our approach as a tool, called Java StarFinder, and evaluated it on
a set of programs with complex heap inputs. The results show that our approach
significantly reduces the number of invalid test inputs and improves the test
coverage
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
Concolic Testing Heap-Manipulating Programs
Concolic testing is a test generation technique which works effectively by
integrating random testing generation and symbolic execution. Existing concolic
testing engines focus on numeric programs. Heap-manipulating programs make
extensive use of complex heap objects like trees and lists. Testing such
programs is challenging due to multiple reasons. Firstly, test inputs for such
program are required to satisfy non-trivial constraints which must be specified
precisely. Secondly, precisely encoding and solving path conditions in such
programs are challenging and often expensive. In this work, we propose the
first concolic testing engine called CSF for heap-manipulating programs based
on separation logic. CSF effectively combines specification-based testing and
concolic execution for test input generation. It is evaluated on a set of
challenging heap-manipulating programs. The results show that CSF generates
valid test inputs with high coverage efficiently. Furthermore, we show that CSF
can be potentially used in combination with precondition inference tools to
reduce the user effort
S2TD: a Separation Logic Verifier that Supports Reasoning of the Absence and Presence of Bugs
Heap-manipulating programs are known to be challenging to reason about. We
present a novel verifier for heap-manipulating programs called S2TD, which
encodes programs systematically in the form of Constrained Horn Clauses (CHC)
using a novel extension of separation logic (SL) with recursive predicates and
dangling predicates. S2TD actively explores cyclic proofs to address the path
explosion problem. S2TD differentiates itself from existing CHC-based verifiers
by focusing on heap-manipulating programs and employing cyclic proof to
efficiently verify or falsify them with counterexamples. Compared with existing
SL-based verifiers, S2TD precisely specifies the heaps of de-allocated pointers
to avoid false positives in reasoning about the presence of bugs. S2TD has been
evaluated using a comprehensive set of benchmark programs from the SV-COMP
repository. The results show that S2TD is more effective than state-of-art
program verifiers and is more efficient than most of them.Comment: 24 page
Verification of Pointer-Based Programs with Partial Information
The proliferation of software across all aspects of people's life means that software failure can bring catastrophic result. It is therefore highly desirable to be able to develop software that is verified to meet its expected specification. This has also been identified as a key objective in one of the UK Grand Challenges (GC6) (Jones et al., 2006; Woodcock, 2006). However, many difficult problems still remain in achieving this objective, partially due to the wide use of (recursive) shared mutable data structures which are hard to keep track of statically in a precise and concise way.
This thesis aims at building a verification system for both memory safety and functional correctness of programs manipulating pointer-based data structures, which can deal with two scenarios where only partial information about the program is available. For instance the verifier may be supplied with only partial program specification, or with full specification but only part of the program code. For the first scenario, previous state-of-the-art works (Nguyen et al., 2007; Chin et al., 2007; Nguyen and Chin, 2008; Chin et al, 2010) generally require users to provide full specifications for each method of the program to be verified. Their approach seeks much intellectual effort from users, and meanwhile users are liable to make mistakes in writing such specifications. This thesis proposes a new approach to program verification that allows users to provide only partial specification to methods. Our approach will then refine the given annotation into a more complete specification by discovering missing constraints. The discovered constraints may involve both numerical and multiset properties that could be later confirmed or revised by users. Meanwhile, we further augment our approach by requiring only partial specification to be given for primary methods of a program. Specifications for loops and auxiliary methods can then be systematically discovered by our augmented mechanism, with the help of information propagated from the primary methods. This work is aimed at verifying beyond shape properties, with the eventual goal of analysing both memory safety and functional properties for pointer-based data structures. Initial experiments have confirmed that we can automatically refine partial specifications with non-trivial constraints, thus making it easier for users to handle specifications with richer properties.
For the second scenario, many programs contain invocations to unknown components and hence only part of the program code is available to the verifier. As previous works generally require the whole of program code be present, we target at the verification of memory safety and functional correctness of programs manipulating pointer-based data structures, where the program code is only partially available due to invocations to unknown components. Provided with a Hoare-style specification ({Pre} prog {Post}) where program (prog) contains calls to some unknown procedure (unknown), we infer a specification (mspecu) for the unknown part (unknown) from the calling contexts, such that the problem of verifying program (prog) can be safely reduced to the problem of proving that the unknown procedure (unknown) (once its code is available) meets the derived specification (mspecu). The expected specification (mspecu) is automatically calculated using an abduction-based shape analysis specifically designed for a combined abstract domain. We have implemented a system to validate the viability of our approach, with encouraging experimental results
Towards General Loop Invariant Generation via Coordinating Symbolic Execution and Large Language Models
Loop invariants, essential for program verification, are challenging to
auto-generate especially for programs incorporating complex memory
manipulations. Existing approaches for generating loop invariants rely on fixed
sets or templates, hampering adaptability to real-world programs. Recent
efforts have explored machine learning for loop invariant generation, but the
lack of labeled data and the need for efficient generation are still
troublesome. We consider the advent of the large language model (LLM) presents
a promising solution, which can analyze the separation logic assertions after
symbolic execution to infer loop invariants. To overcome the data scarcity
issue, we propose a self-supervised learning paradigm to fine-tune LLM, using
the split-and-reassembly of predicates to create an auxiliary task and generate
rich synthetic data for offline training. Meanwhile, the proposed interactive
system between LLM and traditional verification tools provides an efficient
online querying process for unseen programs. Our framework can readily extend
to new data structures or multi-loop programs since our framework only needs
the definitions of different separation logic predicates, aiming to bridge the
gap between existing capabilities and requirements of loop invariant generation
in practical scenarios. Experiments across diverse memory-manipulated programs
have demonstrated the performance of our proposed method compared to the
baselines with respect to efficiency and effectiveness.Comment: Preprint, under revie
Compositional Verification of Heap-Manipulating Programs through Property-Guided Learning
Analyzing and verifying heap-manipulating programs automatically is
challenging. A key for fighting the complexity is to develop compositional
methods. For instance, many existing verifiers for heap-manipulating programs
require user-provided specification for each function in the program in order
to decompose the verification problem. The requirement, however, often hinders
the users from applying such tools. To overcome the issue, we propose to
automatically learn heap-related program invariants in a property-guided way
for each function call. The invariants are learned based on the memory graphs
observed during test execution and improved through memory graph mutation. We
implemented a prototype of our approach and integrated it with two existing
program verifiers. The experimental results show that our approach enhances
existing verifiers effectively in automatically verifying complex
heap-manipulating programs with multiple function calls
Loop invariant synthesis in a combined abstract domain
Automated verification of memory safety and functional correctness for heap-manipulating programs has been a challenging task, especially when dealing with complex data structures with strong invariants involving both shape and numerical properties. Existing verification systems usually rely on users to supply annotations to guide the verification, which can be cumbersome and error-prone by hand and can significantly restrict the usability of the verification system. In this paper, we reduce the need for some user annotations by automatically inferring loop invariants over an abstract domain with both shape and numerical information. Our loop invariant synthesis is conducted automatically by a fixed-point iteration process, equipped with newly designed abstraction mechanism, together with join and widening operators over the combined domain. We have also proven the soundness and termination of our approach. Initial experiments confirm that we can synthesise loop invariants with non-trivial constraints
Program Analysis in A Combined Abstract Domain
Automated verification of heap-manipulating programs is a challenging task due to the complexity of aliasing and mutability of data structures used in these programs. The properties of a number of important data structures do not only relate to one domain, but to combined multiple domains, such as sorted list, priority queues, height-balanced trees and so on. The safety and sometimes efficiency of programs do rely on the properties of those data structures. This
thesis focuses on developing a verification system for both functional correctness and memory safety of such programs which involve heap-based data structures.
Two automated inference mechanisms are presented for heap-manipulating programs in this thesis. Firstly, an abstract interpretation based approach is proposed to synthesise program invariants in a combined pure and shape domain. Newly designed abstraction, join and widening
operators have been defined for the combined domain. Furthermore, a compositional analysis approach is described to discover both pre-/post-conditions of programs with a bi-abduction technique in the combined domain.
As results of my thesis, both inference approaches have been
implemented and the obtained results validate the feasibility and precision of proposed approaches. The outcomes of the thesis confirm that it is possible and practical to analyse heap-manipulating programs automatically and precisely by using abstract interpretation
in a sophisticated combined domain
- …