341 research outputs found

    Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs

    Full text link
    Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems

    From sequential to parallel Inductive Logic Programming

    Get PDF
    Inductive Logic Programming (ILP) has achieved considerablesuccess in a wide range of domains. It is recognized however thateciency is a major obstacle to the use of ILP systems in applicationsrequiring large amounts of data. In this paper we address the problem ofeciency in ILP in three steps: i) we survey speedup techniques proposedfor sequential execution of ILP systems; ii) we survey dierent ways ofparallelizing an ILP system and; ii) adapt and combine the sequentialexecution speedup techniques in the parallel implementations of an ILPsystem. We also propose a novel technique to partition the search spaceinto independent sub-spaces that may be adequately searched in parallel

    A workbench to develop ILP systems

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    As lazy as it can be

    Get PDF
    Inductive Logic Programming (ILP) is a promising technology for knowledgeextraction applications. ILP has produced intelligible solutions for a wide variety of domains where it has been applied. The ILP lack of efficiency is, however, a major impediment for its scalability to applications requiring large amounts of data. In this paper we address important issues that must be solved to make ILP scalable to applicationsof knowledge extraction in large amounts of data. The issues include: efficiency and storage requirements.We propose and evaluate a set of techniques, globally called lazy evaluation of examples, to improve the efficiency of ILP systems. Lazy evaluation is essentially a way to avoid or postpone the evaluation of the generated hypotheses (coverage tests). To reduce the storage amount a representation schema called interval trees is proposed and evaluated.All the techniques were evaluated using the IndLog ILP system and a set of ILPdatasets referenced in the literature. The proposals lead to substantial efficiency improvements and memory savings and are generally applicable to any ILP system

    Data mining the yeast genome in a lazy functional language

    Get PDF

    Efficient Learning and Evaluation of Complex Concepts in Inductive Logic Programming

    No full text
    Inductive Logic Programming (ILP) is a subfield of Machine Learning with foundations in logic programming. In ILP, logic programming, a subset of first-order logic, is used as a uniform representation language for the problem specification and induced theories. ILP has been successfully applied to many real-world problems, especially in the biological domain (e.g. drug design, protein structure prediction), where relational information is of particular importance. The expressiveness of logic programs grants flexibility in specifying the learning task and understandability to the induced theories. However, this flexibility comes at a high computational cost, constraining the applicability of ILP systems. Constructing and evaluating complex concepts remain two of the main issues that prevent ILP systems from tackling many learning problems. These learning problems are interesting both from a research perspective, as they raise the standards for ILP systems, and from an application perspective, where these target concepts naturally occur in many real-world applications. Such complex concepts cannot be constructed or evaluated by parallelizing existing top-down ILP systems or improving the underlying Prolog engine. Novel search strategies and cover algorithms are needed. The main focus of this thesis is on how to efficiently construct and evaluate complex hypotheses in an ILP setting. In order to construct such hypotheses we investigate two approaches. The first, the Top Directed Hypothesis Derivation framework, implemented in the ILP system TopLog, involves the use of a top theory to constrain the hypothesis space. In the second approach we revisit the bottom-up search strategy of Golem, lifting its restriction on determinate clauses which had rendered Golem inapplicable to many key areas. These developments led to the bottom-up ILP system ProGolem. A challenge that arises with a bottom-up approach is the coverage computation of long, non-determinate, clauses. Prolog’s SLD-resolution is no longer adequate. We developed a new, Prolog-based, theta-subsumption engine which is significantly more efficient than SLD-resolution in computing the coverage of such complex clauses. We provide evidence that ProGolem achieves the goal of learning complex concepts by presenting a protein-hexose binding prediction application. The theory ProGolem induced has a statistically significant better predictive accuracy than that of other learners. More importantly, the biological insights ProGolem’s theory provided were judged by domain experts to be relevant and, in some cases, novel
    • …
    corecore