Machine Learning for Instance Selection in SMT Solving

Abstract

International audienceSMT solvers are among the most suited tools for quantifier-free first-order problems, and their support for quantified formulas has been improving in recent years. To instantiate quantifiers, they rely on heuristic techniques that generate thousands of instances, most of them useless. We propose to apply state-of-the-art machine learning techniques as classifiers for instances on top of the instantiation process. We show that such techniques can indeed decrease the number of generated useless instances. We envision that this could lead to more efficient SMT solving for quantified problems. Satisfiability-modulo-theories (SMT) solvers are among the best backends for verification tools and "hammers" in proof assistants. When proof obligations contain quantified formulas, SMT solvers rely on instantiation, replacing quantified subformulas by sets of ground instances. Three main techniques have been designed: enumerative [11], trigger-based [4], and conflict-based [12] instantiation. Among these, only conflict-based instantiation computes instances that are guaranteed to be relevant, but it is incomplete and is normally used in combination with other techniques. Enumerative and trigger-based techniques are highly heuristic and generate a large number of instances, most of them useless. As a result, the search space of the solver explodes. Reducing the number of instances could improve the solver's efficiency and success rate within a given time limit. We propose to use a state-of-the-art machine learning algorithm as a predictor over the generated set of instances to filter out irrelevant instances, and thus decrease the number of instances given to the ground solver. The predictor is invoked after each instantiation round to rate the potential usefulness of each generated instance. Several strategies are then used to build a subset of potentially relevant instances that are immediately added to the ground solver. Adding the other instances is postponed. We conducted our experiment in veriT [2], an SMT solver that implements all three in-stantiation techniques described above. We chose as predictor the XGBoost gradient boosting toolkit [3] with the binary classification objective. This configuration had already been used successfully in the context of theorem proving [6, 10]. Choosing a suitable set of features is crucial for effective machine learning. The features determine how precise the representation of the problem is. Previous works already investigate features for theorem proving [1, 5, 6, 8-10]. Our features are more specifically inspired by ENIGMA [6] and RLCoP [7]. They are basically term symbols and term walks with symbol sequences projected to features using Vowpal Wabbit hashing. Term variables and Skolem constants are translated analogously to constants. The model is further enriched with abstract features such as term size, term depth, and the number of instances. To encode our problem into sparse vectors, we use three kinds of information available to the solver: the ground part of the formula (set of literals l 1 ,. .. , l m), the quantified formul

    Similar works