37 research outputs found

    GRUNGE: A Grand Unified ATP Challenge

    Full text link
    This paper describes a large set of related theorem proving problems obtained by translating theorems from the HOL4 standard library into multiple logical formalisms. The formalisms are in higher-order logic (with and without type variables) and first-order logic (possibly with multiple types, and possibly with type variables). The resultant problem sets allow us to run automated theorem provers that support different logical formats on corresponding problems, and compare their performances. This also results in a new "grand unified" large theory benchmark that emulates the ITP/ATP hammer setting, where systems and metasystems can use multiple ATP formalisms in complementary ways, and jointly learn from the accumulated knowledge.Comment: CADE 27 -- 27th International Conference on Automated Deductio

    Disproving in First-Order Logic with Definitions, Arithmetic and Finite Domains

    Get PDF
    This thesis explores several methods which enable a first-order reasoner to conclude satisfiability of a formula modulo an arithmetic theory. The most general method requires restricting certain quantifiers to range over finite sets; such assumptions are common in the software verification setting. In addition, the use of first-order reasoning allows for an implicit representation of those finite sets, which can avoid scalability problems that affect other quantified reasoning methods. These new techniques form a useful complement to existing methods that are primarily aimed at proving validity. The Superposition calculus for hierarchic theory combinations provides a basis for reasoning modulo theories in a first-order setting. The recent account of ‘weak abstraction’ and related improvements make an mplementation of the calculus practical. Also, for several logical theories of interest Superposition is an effective decision procedure for the quantifier free fragment. The first contribution is an implementation of that calculus (Beagle), including an optimized implementation of Cooper’s algorithm for quantifier elimination in the theory of linear integer arithmetic. This includes a novel means of extracting values for quantified variables in satisfiable integer problems. Beagle won an efficiency award at CADE Automated theorem prover System Competition (CASC)-J7, and won the arithmetic non-theorem category at CASC-25. This implementation is the start point for solving the ‘disproving with theories’ problem. Some hypotheses can be disproved by showing that, together with axioms the hypothesis is unsatisfiable. Often this is relative to other axioms that enrich a base theory by defining new functions. In that case, the disproof is contingent on the satisfiability of the enrichment. Satisfiability in this context is undecidable. Instead, general characterizations of definition formulas, which do not alter the satisfiability status of the main axioms, are given. These general criteria apply to recursive definitions, definitions over lists, and to arrays. This allows proving some non-theorems which are otherwise intractable, and justifies similar disproofs of non-linear arithmetic formulas. When the hypothesis is contingently true, disproof requires proving existence of a model. If the Superposition calculus saturates a clause set, then a model exists, but only when the clause set satisfies a completeness criterion. This requires each instance of an uninterpreted, theory-sorted term to have a definition in terms of theory symbols. The second contribution is a procedure that creates such definitions, given that a subset of quantifiers range over finite sets. Definitions are produced in a counter-example driven way via a sequence of over and under approximations to the clause set. Two descriptions of the method are given: the first uses the component solver modularly, but has an inefficient counter-example heuristic. The second is more general, correcting many of the inefficiencies of the first, yet it requires tracking clauses through a proof. This latter method is shown to apply also to lists and to problems with unbounded quantifiers. Together, these tools give new ways for applying successful first-order reasoning methods to problems involving interpreted theories

    Automated Theorem Proving with Extensions of First-Order Logic

    Get PDF
    Automated theorem provers are computer programs that check whether a logical conjecture follows from a set of logical statements. The conjecture and the statements are expressed in the language of some formal logic, such as first-order logic. Theorem provers for first-order logic have been used for automation in proof assistants, verification of programs, static analysis of networks, and other purposes. However, the efficient usage of these provers remains challenging. One of the challenges is the complexity of translating domain problems to first-order logic. Not only can such translation be cumbersome due to semantic differences between the domain and the logic, but it might inadvertently result in problems that provers cannot easily handle.The work presented in the thesis addresses this challenge by developing an extension of first-order logic named FOOL. FOOL contains syntactical features of programming languages and more expressive logics, is friendly for translation of problems from various domains, and can be efficiently supported by existing theorem provers. We describe the syntax and semantics of FOOL and present a simple translation from FOOL to plain first-order logic. We describe an efficient clausal normal form transformation algorithm for FOOL and based on it implement a support for FOOL in the Vampire theorem prover. We illustrate the efficient use of FOOL for program verification by describing a concise encoding of next state relations of imperative programs in FOOL. We show a usage of features of FOOL in problems of static analysis of networks. We demonstrate the efficiency of automated theorem proving in FOOL with an extensive set of experiments. In these experiments we compare the performance of Vampire on a large collection of problems from various sources translated to FOOL and ordinary first-order logic. Finally, we fix the syntax for FOOL in TPTP, the standard language of first-order theorem provers

    Analyzing the selection of the herbrand base process for building a Smart Semantic Tree Theorem Prover

    Get PDF
    Traditionally, semantic trees have played an important role in proof theory for validating the unsatisfiability of sets of clauses. More recently, they have also been used to implement more practical tools for verifying the unsatisfiability of clause sets in first-order predicate logic. The method ultimately relies on the Herbrand Base, a set used in building the semantic tree. The Herbrand Base is used together with the Herbrand Universe, which stems from the initial clause set in a particular theorem. When searching for a closed semantic tree, the selection of suitable atoms from the Herbrand Base is very important and should be carried out carefully by educated guesses in order to avoid building a tree using atoms which are irrelevant for the proof. In an effort to circumvent the creation of irrelevant ground instances, a novel approach is investigated in this dissertation. As opposed to creating the ground instances of the clauses in S in a strict syntactic order, the values will be established through calculations which are based on relevance for the problem at hand. This idea has been applied and accordingly tested with the use of the Smart Semantic Tree Theorem Prover (SSTTP), which provides an algorithm for choos- ing prominent atoms from the Herbrand Base for utilisation in the generation of closed semantic trees. Part of this study is an empirical investigation of this prover performance on first-order problems without equality, as well as whether or not it is able to compete with modern theorem provers in certain niches. The results of the SSTTP are promising in terms of finding proofs in less time than some of the state-of-the-art provers. However, it can not compete with them in terms of the total number of the solved problems
    corecore