5 research outputs found

    Learning programs by learning from failures

    Full text link
    We describe an inductive logic programming (ILP) approach called learning from failures. In this approach, an ILP system (the learner) decomposes the learning problem into three separate stages: generate, test, and constrain. In the generate stage, the learner generates a hypothesis (a logic program) that satisfies a set of hypothesis constraints (constraints on the syntactic form of hypotheses). In the test stage, the learner tests the hypothesis against training examples. A hypothesis fails when it does not entail all the positive examples or entails a negative example. If a hypothesis fails, then, in the constrain stage, the learner learns constraints from the failed hypothesis to prune the hypothesis space, i.e. to constrain subsequent hypothesis generation. For instance, if a hypothesis is too general (entails a negative example), the constraints prune generalisations of the hypothesis. If a hypothesis is too specific (does not entail all the positive examples), the constraints prune specialisations of the hypothesis. This loop repeats until either (i) the learner finds a hypothesis that entails all the positive and none of the negative examples, or (ii) there are no more hypotheses to test. We introduce Popper, an ILP system that implements this approach by combining answer set programming and Prolog. Popper supports infinite problem domains, reasoning about lists and numbers, learning textually minimal programs, and learning recursive programs. Our experimental results on three domains (toy game problems, robot strategies, and list transformations) show that (i) constraints drastically improve learning performance, and (ii) Popper can outperform existing ILP systems, both in terms of predictive accuracies and learning times.Comment: Accepted for the machine learning journa

    Inductive logic programming as satisfiability modulo theories

    Get PDF
    At the intersection of machine learning, program synthesis and automated reasoning lies the field of Inductive Logic Programming (ILP). The aim of ILP is to automatically learn relational programs from input/output examples, typically through logic-based techniques. Inspired by Karl Popper’s falsification perspective on science, this dissertation sets out a new approach to ILP: Learning From Failures (LFF). In science, starting from a huge set of a priori viable hypotheses, we select a hypothesis to test. This hypothesis typically gets falsified due to failing in some specific way. By examining the failure we learn that an entire space of related hypotheses is now ruled out. Having thus reduced our set of viable hypotheses, we subsequently select from just those that remain. LFF applies this methodology to program induction, codifying it as a three-stage loop. The generate stage maintains a formula whose satisfying assignments correspond to the set of viable hypotheses. The test stage takes a satisfying assignment, interprets it as a logic program and tests it against training examples – imperfect fit is considered a failure. The constrain stage turns a failure into constraints to add to the generate stage’s formula, thereby eliminating a class of hypotheses which will fail for the same reason. The thesis of this dissertation is three-fold. The first claim is that LFF frames the ILP problem as one of Satisfiability Modulo Theories (SMT). With the search for viable hypotheses handed-off to a SAT-solver, the test stage can be regarded as a theory solver collaborating with the SAT-solver, effectively making ILP’s notion of background knowledge into a (Horn) background theory. The second claim is that LFF’s three-stage loop is an effective basis for falsification-based program induction. Chapter 4 develops the above correspondence into a feature-rich and flexible three-stage ILP system. Experimental evidence is provided for this system going beyond the state-of-the-art in ILP, e.g., by supporting large hypothesis spaces and large domains. The third claim is that the LFF-as-SMT-perspective helps apply satisfiability solving techniques to ILP, in particular to reduce hypothesis space exploration. In Chapter 5, we show that SMT-based techniques for explaining conflicts have a natural analog in terms of explaining which parts of a hypothesis are responsible for its failure. In Chapter 6, we incorporate other theory solvers alongside the test stage to reason about the (satisfiability of) over-approximating properties of hypotheses. We show that both of these techniques can significantly reduce the number of iterations of the three-stage loop

    A framework for dynamic heterogeneous information networks change discovery based on knowledge engineering and data mining methods

    Get PDF
    Information Networks are collections of data structures that are used to model interactions in social and living phenomena. They can be either homogeneous or heterogeneous and static or dynamic depending upon the type and nature of relations between the network entities. Static, homogeneous and heterogenous networks have been widely studied in data mining but recently, there has been renewed interest in dynamic heterogeneous information networks (DHIN) analysis because the rich temporal, structural and semantic information is hidden in this kind of network. The heterogeneity and dynamicity of the real-time networks offer plenty of prospects as well as a lot of challenges for data mining. There has been substantial research undertaken on the exploration of entities and their link identification in heterogeneous networks. However, the work on the formal construction and change mining of heterogeneous information networks is still infant due to its complex structure and rich semantics. Researchers have used clusters-based methods and frequent pattern-mining techniques in the past for change discovery in dynamic heterogeneous networks. These methods only work on small datasets, only provide the structural change discovery and fail to consider the quick and parallel process on big data. The problem with these methods is also that cluster-based approaches provide the structural changes while the pattern-mining provide semantic characteristics of changes in a dynamic network. Another interesting but challenging problem that has not been considered by past studies is to extract knowledge from these semantically richer networks based on the user-specific constraint.This study aims to develop a new change mining system ChaMining to investigate dynamic heterogeneous network data, using knowledge engineering with semantic web technologies and data mining to overcome the problems of previous techniques, this system and approach are important in academia as well as real-life applications to support decision-making based on temporal network data patterns. This research has designed a novel framework “ChaMining” (i) to find relational patterns in dynamic networks locally and globally by employing domain ontologies (ii) extract knowledge from these semantically richer networks based on the user-specific (meta-paths) constraints (iii) Cluster the relational data patterns based on structural properties of nodes in the dynamic network (iv) Develop a hybrid approach using knowledge engineering, temporal rule mining and clustering to detect changes in the dynamic heterogeneous networks.The evidence is presented in this research shows that the proposed framework and methods work very efficiently on the benchmark big dynamic heterogeneous datasets. The empirical results can contribute to a better understanding of the rich semantics of DHIN and how to mine them using the proposed hybrid approach. The proposed framework has been evaluated with the previous six dynamic change detection algorithms or frameworks and it performs very well to detect microscopic as well as macroscopic human-understandable changes. The number of change patterns extracted in this approach was higher than the previous approaches which help to reduce the information loss

    A Refinement Operator for Theories

    No full text
    corecore