8 research outputs found

    ILP - Just trie it

    Get PDF
    Despite the considerable success of Inductive Logic Programming (ILP), deployed ILP systems still have efficiency problems when applied to complex problems. Several techniques have been proposed to address the efficiency issue. Such proposals include query transformations, query packs, lazy evaluation and parallel execution of ILP systems, to mention just a few. We propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique takes advantage of the two stage procedure of Mode Directed Inverse Entailment (MDIE) systems. In the first stage of a MDIE system, where the bottom clause is constructed, we store not only the bottom clause but also valuable additional information. The information stored is sufficient to evaluate the clauses constructed in the second stage without the need for a theorem prover. We used a data structure called Trie to efficiently store all bottom clauses produced using all examples (positive and negative) as seeds. The technique was implemented and evaluated using two well known data sets from the ILP literature. The results are promising both in terms of execution time and accuracy

    MR-Radix: a multi-relational data mining algorithm

    Get PDF
    Abstract\ud \ud \ud \ud Background\ud \ud Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach.\ud \ud \ud \ud Methods\ud \ud Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix.\ud \ud \ud \ud Results\ud \ud This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine.\ud \ud \ud \ud Conclusion\ud \ud The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.This project was financed by CAPES. We thank David R. M. Mercer for English language review and translation

    Efficient bottom-up inductive logic programming

    Get PDF
    Inductive logic programming (ILP) is a subfield of machine learning that uses logic programming as its input and output language. While the language of logic programming places ILP as one of the most expressive approaches to machine learning, it also causes the space of candidate solutions to be potentially infinite. ILP systems therefore need to be able to efficiently search through a possibly infinite space, often imposing limits on the hypothesis language in order to be able to handle large problems. We address two problems in the domain of bottom-up ILP systems: their inability to use negation and their efficiency. Bottom-up approaches to ILP rely on the concept of bottom clauses of examples. Bottom clause of a given example includes all known positive facts about it in the background knowledge, causing a bottom-up ILP system to be unable to reason with negation. One approach that enables such systems to use negation is the closed world specialisation (CWS). The method attempts to learn rules that hold for incorrectly covered negative examples, and then adds the negated rule to the hypothesis body. In this manner the use of negation is enabled using only positive facts. Existing uses of CWS use it to further specialise the output theory, which consists of clauses containing only positive literals that achieved the best scores. We show that such application of CWS is prone to lead to suboptimal solutions and provide two alternative uses of CWS inside of the hypothesis generation process. We implemented the two approaches as the ProGolemNot and ProGolemNRNot ILP systems, both based on the ProGolem system. We show that the two proposed systems both perform at least as well in terms of achieved accuracies as the base ProGolem system or its variant using CWS to further specialise the output hypothesis. Experimental comparison of the two systems also shows that they are equivalent in terms of the quality of their outputs, while Pro-GolemNRNot needs less time to derive the solution. ILP systems tend to spend most of the time computing the coverage of candidate hypotheses. In bottom-up systems the quantity of candidate hypotheses to be tested also depends on the number of literals in the bottom-clause of a randomly chosen example that forms the lower bound of the search space. In the thesis we define the concept of pairwise saturations. Pairwise saturations allow us to safely remove literals from a given bottom clause under the assumption that the final hypothesis also covers some other randomly chosen example. Safe removal of these literals does not require explicit coverage testing and can be performed faster. We implemented pairwise saturations along with their generalisation to n-wise saturations in the ProParGolem system. Experiments show that the speedups obtained from using pairwise saturations are highly dependent on the background knowledge structure. We observed speedups of up to factor 1.44 without loss of accuracy. We combine ProGolemNRNot with ProParGolem in ProParGolemNRNot – an ILP system that uses pairwise saturations and CWS. We use ProParGolemNRNot to learn simple geometric concepts using data obtained from simulated depth sensors. In the devised experiment the system can use previously learned concepts to describe new ones. Thee solutions found by the system are intuitively correct and achieve high accuracy on test data

    Techniques pour l'exploration de données structurées et pour la découverte de connaissances en théorie des graphes

    Get PDF
    Improving frequent subgraph mining in the presence of symmetry -- Using background knowledge to improve structured data mining -- Automated generation of conjectures on forbidden subgraph characterization

    Vertex unique labelled subgraph mining

    Get PDF
    This thesis proposes the novel concept of Vertex Unique Labelled Subgraph (VULS) mining with respect to the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of VULS can offer in the context of vertex classification. A VULS is a subgraph with a particular structure and edge labelling that has a unique vertex labelling associated with it within a given (set of) host graph(s). VULS can describe highly discriminative and significant local geometries each with a particular associated vertex label pattern. This knowledge can then be used to predict vertex labels in ``unseen" graphs (graphs with edge labels, but without vertex labels). Thus this research is directed at identifying (mining) VULS, of various forms, that ``best" serve to both capture effectively graph information, while at the same time allowing for the generation of effective vertex label predictors (classifiers). To this end, four VULS classifiers are proposed, directed at mining four different kinds of VULS: (i) complete, (ii) minimal, (iii) frequent and (iv) minimal frequent. The thesis describes and discusses each of these in detail including, in each case, the theoretical definition and algorithms with respect to VULS identification and prediction. A full evaluation of each of the VULS categories is also presented. VULS has wide applicability in areas where the domain of interest can be represented in the form of some sort of a graph. The evaluation was primarily directed at predicting a form of deformation, known as springback, that occurs in the Asymmetric Incremental Sheet Forming (AISF) manufacturing process. For the evaluation two flat-topped, square-based, pyramid shapes were used. Each pyramid had been manufactured twice using Steel and twice using Titanium. The utilisation of VULS was also explored by applying the VULS concept to the field of satellite image interpretation. Satellite data describing two villages located in a rural part of the Ethiopian hinterland were used for this purpose. In each case the ground surface was represented in a similar manner to the way that AISF sheet metal surfaces were represented, with the zz dimension describing the grey scale value. The idea here was to predict vertex labels describing ground type. As will become apparent, from the work presented in this thesis, the VULS concept is well suited to the task of 3D surface classification with respect to AISF and satellite imagery. The thesis demonstrates that the use of frequent VULS (rather than the other forms of VULS considered) produces more efficient results in the AISF sheet metal forming application domain, whilst the use of minimal VULS provided promising results in the context of the satellite image interpretation domain. The reported evaluation also indicates that a sound foundation has been established for future work on more general VULS based vertex classification