5 research outputs found
Analysis of Software Patches Using Numerical Abstract Interpretation
International audienceWe present a static analysis for software patches. Given two syntactically close versions of a program, our analysis can infer a semantic difference, and prove that both programs compute the same outputs when run on the same inputs. Our method is based on abstract interpretation, and parametric in the choice of an abstract domain. We focus on numeric properties only. Our method is able to deal with unbounded executions of infinite-state programs, reading from infinite input streams. Yet, it is limited to comparing terminating executions, ignoring non terminating ones.We first present a novel concrete collecting semantics, expressing the behaviors of both programs at the same time. Then, we propose an abstraction of infinite input streams able to prove that programs that read from the same stream compute equal output values. We then show how to leverage classic numeric abstract domains, such as polyhedra or octagons, to build an effective static analysis. We also introduce a novel numeric domain to bound differences between the values of the variables in the two programs, which has linear cost, and the right amount of relationality to express useful properties of software patches.We implemented a prototype and experimented on a few small examples from the literature. Our prototype operates on a toy language, and assumes a joint syntactic representation of two versions of a program given, which distinguishes between common and distinctive parts
Putting the Semantics into Semantic Versioning
The long-standing aspiration for software reuse has made astonishing strides
in the past few years. Many modern software development ecosystems now come
with rich sets of publicly-available components contributed by the community.
Downstream developers can leverage these upstream components, boosting their
productivity.
However, components evolve at their own pace. This imposes obligations on and
yields benefits for downstream developers, especially since changes can be
breaking, requiring additional downstream work to adapt to. Upgrading too late
leaves downstream vulnerable to security issues and missing out on useful
improvements; upgrading too early results in excess work. Semantic versioning
has been proposed as an elegant mechanism to communicate levels of
compatibility, enabling downstream developers to automate dependency upgrades.
While it is questionable whether a version number can adequately characterize
version compatibility in general, we argue that developers would greatly
benefit from tools such as semantic version calculators to help them upgrade
safely. The time is now for the research community to develop such tools: large
component ecosystems exist and are accessible, component interactions have
become observable through automated builds, and recent advances in program
analysis make the development of relevant tools feasible. In particular,
contracts (both traditional and lightweight) are a promising input to semantic
versioning calculators, which can suggest whether an upgrade is likely to be
safe.Comment: to be published as Onward! Essays 202
Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules
We target the problem of automatically synthesizing proofs of semantic
equivalence between two programs made of sequences of statements. We represent
programs using abstract syntax trees (AST), where a given set of
semantics-preserving rewrite rules can be applied on a specific AST pattern to
generate a transformed and semantically equivalent program. In our system, two
programs are equivalent if there exists a sequence of application of these
rewrite rules that leads to rewriting one program into the other. We propose a
neural network architecture based on a transformer model to generate proofs of
equivalence between program pairs. The system outputs a sequence of rewrites,
and the validity of the sequence is simply checked by verifying it can be
applied. If no valid sequence is produced by the neural network, the system
reports the programs as non-equivalent, ensuring by design no programs may be
incorrectly reported as equivalent. Our system is fully implemented for a given
grammar which can represent straight-line programs with function calls and
multiple types. To efficiently train the system to generate such sequences, we
develop an original incremental training technique, named self-supervised
sample selection. We extensively study the effectiveness of this novel training
approach on proofs of increasing complexity and length. Our system, S4Eq,
achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent
programsComment: 30 pages including appendi