98 research outputs found

    Fast relational learning using bottom clause propositionalization with artificial neural networks

    Get PDF
    Relational learning can be described as the task of learning first-order logic rules from examples. It has enabled a number of new machine learning applications, e.g. graph mining and link analysis. Inductive Logic Programming (ILP) performs relational learning either directly by manipulating first-order rules or through propositionalization, which translates the relational task into an attribute-value learning task by representing subsets of relations as features. In this paper, we introduce a fast method and system for relational learning based on a novel propositionalization called Bottom Clause Propositionalization (BCP). Bottom clauses are boundaries in the hypothesis search space used by ILP systems Progol and Aleph. Bottom clauses carry semantic meaning and can be mapped directly onto numerical vectors, simplifying the feature extraction process. We have integrated BCP with a well-known neural-symbolic system, C-IL2P, to perform learning from numerical vectors. C-IL2P uses background knowledge in the form of propositional logic programs to build a neural network. The integrated system, which we call CILP++, handles first-order logic knowledge and is available for download from Sourceforge. We have evaluated CILP++ on seven ILP datasets, comparing results with Aleph and a well-known propositionalization method, RSD. The results show that CILP++ can achieve accuracy comparable to Aleph, while being generally faster, BCP achieved statistically significant improvement in accuracy in comparison with RSD when running with a neural network, but BCP and RSD perform similarly when running with C4.5. We have also extended CILP++ to include a statistical feature selection method, mRMR, with preliminary results indicating that a reduction of more than 90 % of features can be achieved with a small loss of accuracy

    kLog: A Language for Logical and Relational Learning with Kernels

    Full text link
    We introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph --- in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials

    Relational Knowledge Extraction from Attribute-Value Learners

    Get PDF
    Bottom Clause Propositionalization (BCP) is a recent propositionalization method which allows fast relational learning. Propositional learners can use BCP to obtain accuracy results comparable with Inductive Logic Programming (ILP) learners. However, differently from ILP learners, what has been learned cannot normally be represented in first-order logic. In this paper, we propose an approach and introduce a novel algorithm for extraction of first-order rules from propositional rule learners, when dealing with data propositionalized with BCP. A theorem then shows that the extracted first-order rules are consistent with their propositional version. The algorithm was evaluated using the rule learner RIPPER, although it can be applied on any propositional rule learner. Initial results show that the accuracies of both RIPPER and the extracted first-order rules can be comparable to those obtained by Aleph (a traditional ILP system), but our approach is considerably faster (obtaining speed-ups of over an order of magnitude), generating a compact rule set with at least the same representation power as standard ILP learners

    Using ILP to Identify Pathway Activation Patterns in Systems Biology

    Get PDF
    We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist

    Propositionalisation of multiple sequence alignments using probabilistic models

    Get PDF
    Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov Models are used as representations. They offer the advantage to provide sound, probabilistic scores. The basic idea we present in this paper is to use the structure of a Profile Hidden Markov Model for propositionalisation. This way we get a simple, extendable representation of multiple sequence alignments which facilitates further analysis by Machine Learning algorighms

    ACORA: Distribution-Based Aggregation for Relational Learning from Identifier Attributes

    Get PDF
    Feature construction through aggregation plays an essential role in modeling relational domains with one-to-many relationships between tables. One-to-many relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation from categorical attributes that can take many values (e.g., object identifiers). We present a novel aggregation method as part of a relational learning system ACORA, that combines the use of vector distance and meta-data about the class-conditional distributions of attribute values. We provide a theoretical foundation for this approach deriving a "relational fixed-effect" model within a Bayesian framework, and discuss the implications of identifier aggregation on the expressive power of the induced model. One advantage of using identifier attributes is the circumvention of limitations caused either by missing/unobserved object properties or by independence assumptions. Finally, we show empirically that the novel aggregators can generalize in the presence of identi- fier (and other high-dimensional) attributes, and also explore the limitations of the applicability of the methods.Information Systems Working Papers Serie
    corecore