9 research outputs found

    Efficient Learning and Evaluation of Complex Concepts in Inductive Logic Programming

    No full text
    Inductive Logic Programming (ILP) is a subfield of Machine Learning with foundations in logic programming. In ILP, logic programming, a subset of first-order logic, is used as a uniform representation language for the problem specification and induced theories. ILP has been successfully applied to many real-world problems, especially in the biological domain (e.g. drug design, protein structure prediction), where relational information is of particular importance. The expressiveness of logic programs grants flexibility in specifying the learning task and understandability to the induced theories. However, this flexibility comes at a high computational cost, constraining the applicability of ILP systems. Constructing and evaluating complex concepts remain two of the main issues that prevent ILP systems from tackling many learning problems. These learning problems are interesting both from a research perspective, as they raise the standards for ILP systems, and from an application perspective, where these target concepts naturally occur in many real-world applications. Such complex concepts cannot be constructed or evaluated by parallelizing existing top-down ILP systems or improving the underlying Prolog engine. Novel search strategies and cover algorithms are needed. The main focus of this thesis is on how to efficiently construct and evaluate complex hypotheses in an ILP setting. In order to construct such hypotheses we investigate two approaches. The first, the Top Directed Hypothesis Derivation framework, implemented in the ILP system TopLog, involves the use of a top theory to constrain the hypothesis space. In the second approach we revisit the bottom-up search strategy of Golem, lifting its restriction on determinate clauses which had rendered Golem inapplicable to many key areas. These developments led to the bottom-up ILP system ProGolem. A challenge that arises with a bottom-up approach is the coverage computation of long, non-determinate, clauses. Prolog’s SLD-resolution is no longer adequate. We developed a new, Prolog-based, theta-subsumption engine which is significantly more efficient than SLD-resolution in computing the coverage of such complex clauses. We provide evidence that ProGolem achieves the goal of learning complex concepts by presenting a protein-hexose binding prediction application. The theory ProGolem induced has a statistically significant better predictive accuracy than that of other learners. More importantly, the biological insights ProGolem’s theory provided were judged by domain experts to be relevant and, in some cases, novel

    Efficient bottom-up inductive logic programming

    Get PDF
    Inductive logic programming (ILP) is a subfield of machine learning that uses logic programming as its input and output language. While the language of logic programming places ILP as one of the most expressive approaches to machine learning, it also causes the space of candidate solutions to be potentially infinite. ILP systems therefore need to be able to efficiently search through a possibly infinite space, often imposing limits on the hypothesis language in order to be able to handle large problems. We address two problems in the domain of bottom-up ILP systems: their inability to use negation and their efficiency. Bottom-up approaches to ILP rely on the concept of bottom clauses of examples. Bottom clause of a given example includes all known positive facts about it in the background knowledge, causing a bottom-up ILP system to be unable to reason with negation. One approach that enables such systems to use negation is the closed world specialisation (CWS). The method attempts to learn rules that hold for incorrectly covered negative examples, and then adds the negated rule to the hypothesis body. In this manner the use of negation is enabled using only positive facts. Existing uses of CWS use it to further specialise the output theory, which consists of clauses containing only positive literals that achieved the best scores. We show that such application of CWS is prone to lead to suboptimal solutions and provide two alternative uses of CWS inside of the hypothesis generation process. We implemented the two approaches as the ProGolemNot and ProGolemNRNot ILP systems, both based on the ProGolem system. We show that the two proposed systems both perform at least as well in terms of achieved accuracies as the base ProGolem system or its variant using CWS to further specialise the output hypothesis. Experimental comparison of the two systems also shows that they are equivalent in terms of the quality of their outputs, while Pro-GolemNRNot needs less time to derive the solution. ILP systems tend to spend most of the time computing the coverage of candidate hypotheses. In bottom-up systems the quantity of candidate hypotheses to be tested also depends on the number of literals in the bottom-clause of a randomly chosen example that forms the lower bound of the search space. In the thesis we define the concept of pairwise saturations. Pairwise saturations allow us to safely remove literals from a given bottom clause under the assumption that the final hypothesis also covers some other randomly chosen example. Safe removal of these literals does not require explicit coverage testing and can be performed faster. We implemented pairwise saturations along with their generalisation to n-wise saturations in the ProParGolem system. Experiments show that the speedups obtained from using pairwise saturations are highly dependent on the background knowledge structure. We observed speedups of up to factor 1.44 without loss of accuracy. We combine ProGolemNRNot with ProParGolem in ProParGolemNRNot – an ILP system that uses pairwise saturations and CWS. We use ProParGolemNRNot to learn simple geometric concepts using data obtained from simulated depth sensors. In the devised experiment the system can use previously learned concepts to describe new ones. Thee solutions found by the system are intuitively correct and achieve high accuracy on test data

    Learning probabilistic logic models from probabilistic examples

    No full text
    Abstract We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches - abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples

    Learning Probabilistic Logic Models from Probabilistic Examples (Extended Abstract)

    No full text
    corecore