Learning the language of software errors

Abstract

We propose to use algorithms for learning deterministic finite automata (DFA), such as Angluin’s L ∗ algorithm, for learning a DFA that describes the possible scenarios under which a given program error occurs. The alphabet of this automaton is given by the user (for instance, a subset of the function call sites or branches), and hence the automaton describes a user-defined abstraction of those scenarios. More generally, the same technique can be used for visualising the behavior of a program or parts thereof. It can also be used for visually comparing different versions of a program (by presenting an automaton for the behavior in the symmetric difference between them), and for assisting in merging several development branches. We present experiments that demonstrate the power of an abstract visual representation of errors and of program segments, accessible via the project’s web page. In addition, our experiments in this paper demonstrate that such automata can be learned efficiently over real-world programs. We also present lazy learning, which is a method for reducing the number of membership queries while using L∗, and demonstrate its effectiveness on standard benchmarks

    Similar works