8 research outputs found

    Apprentissage de Concept a partir d'Exemples (tres) Ambigus

    No full text
    National audienceDans cet article nous explorons l'incompletude des donnees dans le cadre de l'apprentissage de concepts propositionnels. Nous suivons l'idee de H. Hirsh qui etend le paradigme de l'espace des versions : dans cette extension une hypothese doit etre compatible (dans un sens a definir au cas par cas) avec toutes les informations relatives aux exemples. Nous proposons une representation de ces informations qui rend non seulement compte de situations ou les donnes sont manquantes mais aussi de situations plus generales d'ambiguite dans lesquelles l'exemple est cache au sein d'un ensemble d'instances virtuelles. Nous presentons un nouvel algorithme, LEa, qui apprend un concept DNF (monotone) existentiel a partir d'un ensemble d'exemples ambigus. Nous comparons LEa a J48 et Naive Bayes sur des problemes usuels rendus incomplets a divers degres. Résumé français : Dans cet article nous explorons l'incompletude des donnees dans le cadre de l'apprentissage de concepts propositionnels. Nous suivons l'idee de H. Hirsh qui etend le paradigme de l'espace des versions : dans cette extension une hypothese doit etre compatible (dans un sens a definir au cas par cas) avec toutes les informations relatives aux exemples. Nous proposons une representation de ces informations qui rend non seulement compte de situations ou les donnes sont manquantes mais aussi de situations plus generales d'ambiguite dans lesquelles l'exemple est cache au sein d'un ensemble d'instances virtuelles. Nous presentons un nouvel algorithme, LEa, qui apprend un concept DNF (monotone) existentiel a partir d'un ensemble d'exemples ambigus. Nous comparons LEa a J48 et Naive Bayes sur des problemes usuels rendus incomplets a divers degres

    Schema Independent Relational Learning

    Full text link
    Learning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. This variation complicates their off-the-shelf application. In this paper, we introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to an oracle. We prove that current relational learning algorithms are generally not schema independent. For query-based learning algorithms we show that the (de) composition transformations influence their query complexity. We propose Castor, a sample-based relational learning algorithm that achieves schema independence by leveraging data dependencies. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark and real-world datasets under (de) compositions

    Logical settings for concept learning from incomplete examples in First Order Logic

    Full text link
    We investigate here concept learning from incomplete examples. Our first purpose is to discuss to what extent logical learning settings have to be modified in order to cope with data incompleteness. More precisely we are interested in extending the learning from interpretations setting introduced by L. De Raedt that extends to relational representations the classical propositional (or attribute-value) concept learning from examples framework. We are inspired here by ideas presented by H. Hirsh in a work extending the Version space inductive paradigm to incomplete data. H. Hirsh proposes to slightly modify the notion of solution when dealing with incomplete examples: a solution has to be a hypothesis compatible with all pieces of information concerning the examples. We identify two main classes of incompleteness. First, uncertainty deals with our state of knowledge concerning an example. Second, generalization (or abstraction) deals with what part of the description of the example is sufficient for the learning purpose. These two main sources of incompleteness can be mixed up when only part of the useful information is known. We discuss a general learning setting, referred to as "learning from possibilities" that formalizes these ideas, then we present a more specific learning setting, referred to as "assumption-based learning" that cope with examples which uncertainty can be reduced when considering contextual information outside of the proper description of the examples. Assumption-based learning is illustrated on a recent work concerning the prediction of a consensus secondary structure common to a set of RNA sequences

    Efficient bottom-up inductive logic programming

    Get PDF
    Inductive logic programming (ILP) is a subfield of machine learning that uses logic programming as its input and output language. While the language of logic programming places ILP as one of the most expressive approaches to machine learning, it also causes the space of candidate solutions to be potentially infinite. ILP systems therefore need to be able to efficiently search through a possibly infinite space, often imposing limits on the hypothesis language in order to be able to handle large problems. We address two problems in the domain of bottom-up ILP systems: their inability to use negation and their efficiency. Bottom-up approaches to ILP rely on the concept of bottom clauses of examples. Bottom clause of a given example includes all known positive facts about it in the background knowledge, causing a bottom-up ILP system to be unable to reason with negation. One approach that enables such systems to use negation is the closed world specialisation (CWS). The method attempts to learn rules that hold for incorrectly covered negative examples, and then adds the negated rule to the hypothesis body. In this manner the use of negation is enabled using only positive facts. Existing uses of CWS use it to further specialise the output theory, which consists of clauses containing only positive literals that achieved the best scores. We show that such application of CWS is prone to lead to suboptimal solutions and provide two alternative uses of CWS inside of the hypothesis generation process. We implemented the two approaches as the ProGolemNot and ProGolemNRNot ILP systems, both based on the ProGolem system. We show that the two proposed systems both perform at least as well in terms of achieved accuracies as the base ProGolem system or its variant using CWS to further specialise the output hypothesis. Experimental comparison of the two systems also shows that they are equivalent in terms of the quality of their outputs, while Pro-GolemNRNot needs less time to derive the solution. ILP systems tend to spend most of the time computing the coverage of candidate hypotheses. In bottom-up systems the quantity of candidate hypotheses to be tested also depends on the number of literals in the bottom-clause of a randomly chosen example that forms the lower bound of the search space. In the thesis we define the concept of pairwise saturations. Pairwise saturations allow us to safely remove literals from a given bottom clause under the assumption that the final hypothesis also covers some other randomly chosen example. Safe removal of these literals does not require explicit coverage testing and can be performed faster. We implemented pairwise saturations along with their generalisation to n-wise saturations in the ProParGolem system. Experiments show that the speedups obtained from using pairwise saturations are highly dependent on the background knowledge structure. We observed speedups of up to factor 1.44 without loss of accuracy. We combine ProGolemNRNot with ProParGolem in ProParGolemNRNot – an ILP system that uses pairwise saturations and CWS. We use ProParGolemNRNot to learn simple geometric concepts using data obtained from simulated depth sensors. In the devised experiment the system can use previously learned concepts to describe new ones. Thee solutions found by the system are intuitively correct and achieve high accuracy on test data

    Learning Horn Expressions with LOGAN-H

    No full text
    The paper introduces LOGAN-H —a system for learning first-order function-free Horn expressions from interpretations. The system is based on an algorithm that learns by asking questions and that was proved correct in previous work. The current paper shows how the algorithm can be implemented in a practical system, and introduces a new algorithm based on it that avoids interaction and learns from examples only. The LOGAN-H system implements these algorithms and adds several facilities and optimizations that allow efficient applications in a wide range of problems. As one of the important ingredients, the system includes several fast procedures for solving the subsumption problem, an NP-complete problem that needs to be solved many times during the learning process. We describe qualitative and quantitative experiments in several domains. The experiments demonstrate that the system can deal with varied problems, large amounts of data, and that it achieves good classification accuracy
    corecore