341 research outputs found
Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs
Inductive logic programming, or relational learning, is a powerful paradigm
for machine learning or data mining. However, in order for ILP to become
practically useful, the efficiency of ILP systems must improve substantially.
To this end, the notion of a query pack is introduced: it structures sets of
similar queries. Furthermore, a mechanism is described for executing such query
packs. A complexity analysis shows that considerable efficiency improvements
can be achieved through the use of this query pack execution mechanism. This
claim is supported by empirical results obtained by incorporating support for
query pack execution in two existing learning systems
From sequential to parallel Inductive Logic Programming
Inductive Logic Programming (ILP) has achieved considerablesuccess in a wide range of domains. It is recognized however thateciency is a major obstacle to the use of ILP systems in applicationsrequiring large amounts of data. In this paper we address the problem ofeciency in ILP in three steps: i) we survey speedup techniques proposedfor sequential execution of ILP systems; ii) we survey dierent ways ofparallelizing an ILP system and; ii) adapt and combine the sequentialexecution speedup techniques in the parallel implementations of an ILPsystem. We also propose a novel technique to partition the search spaceinto independent sub-spaces that may be adequately searched in parallel
A workbench to develop ILP systems
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
As lazy as it can be
Inductive Logic Programming (ILP) is a promising technology for knowledgeextraction applications. ILP has produced intelligible solutions for a wide variety of domains where it has been applied. The ILP lack of efficiency is, however, a major impediment for its scalability to applications requiring large amounts of data. In this paper we address important issues that must be solved to make ILP scalable to applicationsof knowledge extraction in large amounts of data. The issues include: efficiency and storage requirements.We propose and evaluate a set of techniques, globally called lazy evaluation of examples, to improve the efficiency of ILP systems. Lazy evaluation is essentially a way to avoid or postpone the evaluation of the generated hypotheses (coverage tests). To reduce the storage amount a representation schema called interval trees is proposed and evaluated.All the techniques were evaluated using the IndLog ILP system and a set of ILPdatasets referenced in the literature. The proposals lead to substantial efficiency improvements and memory savings and are generally applicable to any ILP system
Efficient Learning and Evaluation of Complex Concepts in Inductive Logic Programming
Inductive Logic Programming (ILP) is a subfield of Machine Learning with foundations in logic
programming. In ILP, logic programming, a subset of first-order logic, is used as a uniform
representation language for the problem specification and induced theories. ILP has been
successfully applied to many real-world problems, especially in the biological domain (e.g. drug
design, protein structure prediction), where relational information is of particular importance.
The expressiveness of logic programs grants flexibility in specifying the learning task and understandability
to the induced theories. However, this flexibility comes at a high computational
cost, constraining the applicability of ILP systems. Constructing and evaluating complex concepts
remain two of the main issues that prevent ILP systems from tackling many learning
problems. These learning problems are interesting both from a research perspective, as they
raise the standards for ILP systems, and from an application perspective, where these target
concepts naturally occur in many real-world applications. Such complex concepts cannot
be constructed or evaluated by parallelizing existing top-down ILP systems or improving the
underlying Prolog engine. Novel search strategies and cover algorithms are needed.
The main focus of this thesis is on how to efficiently construct and evaluate complex hypotheses
in an ILP setting. In order to construct such hypotheses we investigate two approaches.
The first, the Top Directed Hypothesis Derivation framework, implemented in the ILP system
TopLog, involves the use of a top theory to constrain the hypothesis space. In the second approach
we revisit the bottom-up search strategy of Golem, lifting its restriction on determinate
clauses which had rendered Golem inapplicable to many key areas. These developments led to
the bottom-up ILP system ProGolem. A challenge that arises with a bottom-up approach is the
coverage computation of long, non-determinate, clauses. Prolog’s SLD-resolution is no longer
adequate. We developed a new, Prolog-based, theta-subsumption engine which is significantly
more efficient than SLD-resolution in computing the coverage of such complex clauses.
We provide evidence that ProGolem achieves the goal of learning complex concepts by presenting
a protein-hexose binding prediction application. The theory ProGolem induced has
a statistically significant better predictive accuracy than that of other learners. More importantly,
the biological insights ProGolem’s theory provided were judged by domain experts to
be relevant and, in some cases, novel
- …