894 research outputs found
Schema Independent Relational Learning
Learning novel concepts and relations from relational databases is an
important problem with many applications in database systems and machine
learning. Relational learning algorithms learn the definition of a new relation
in terms of existing relations in the database. Nevertheless, the same data set
may be represented under different schemas for various reasons, such as
efficiency, data quality, and usability. Unfortunately, the output of current
relational learning algorithms tends to vary quite substantially over the
choice of schema, both in terms of learning accuracy and efficiency. This
variation complicates their off-the-shelf application. In this paper, we
introduce and formalize the property of schema independence of relational
learning algorithms, and study both the theoretical and empirical dependence of
existing algorithms on the common class of (de) composition schema
transformations. We study both sample-based learning algorithms, which learn
from sets of labeled examples, and query-based algorithms, which learn by
asking queries to an oracle. We prove that current relational learning
algorithms are generally not schema independent. For query-based learning
algorithms we show that the (de) composition transformations influence their
query complexity. We propose Castor, a sample-based relational learning
algorithm that achieves schema independence by leveraging data dependencies. We
support the theoretical results with an empirical study that demonstrates the
schema dependence/independence of several algorithms on existing benchmark and
real-world datasets under (de) compositions
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
Analysing the behaviour of robot teams through relational sequential pattern mining
This report outlines the use of a relational representation in a Multi-Agent
domain to model the behaviour of the whole system. A desired property in this
systems is the ability of the team members to work together to achieve a common
goal in a cooperative manner. The aim is to define a systematic method to
verify the effective collaboration among the members of a team and comparing
the different multi-agent behaviours. Using external observations of a
Multi-Agent System to analyse, model, recognize agent behaviour could be very
useful to direct team actions. In particular, this report focuses on the
challenge of autonomous unsupervised sequential learning of the team's
behaviour from observations. Our approach allows to learn a symbolic sequence
(a relational representation) to translate raw multi-agent, multi-variate
observations of a dynamic, complex environment, into a set of sequential
behaviours that are characteristic of the team in question, represented by a
set of sequences expressed in first-order logic atoms. We propose to use a
relational learning algorithm to mine meaningful frequent patterns among the
relational sequences to characterise team behaviours. We compared the
performance of two teams in the RoboCup four-legged league environment, that
have a very different approach to the game. One uses a Case Based Reasoning
approach, the other uses a pure reactive behaviour.Comment: 25 page
Improving numerical reasoning capabilities of inductive logic programming systems
Inductive Logic Programming (ILP) systems have been largely applied to classification problems with a considerable success. The use of ILP systems in problems requiring numerical reasoning capabilities has been far less successful. Current systems have very limited numerical reasoning capabilities, which limits the range of domains where the ILP paradigm may be applied. This paper proposes improvements in numerical reasoning capabilities of ILP systems. It proposes the use of statistical-based techniques like Model Validation and Model Selection to improve noise handling and it introduces a new search stopping criterium based on the PAG method to evaluate learning performance. We have found these extensions essential to improve on results mer statistical-based algorithms for time series forecasting used in the empirical evaluation study
Inductive learning spatial attention
This paper investigates the automatic induction of spatial attention
from the visual observation of objects manipulated
on a table top. In this work, space is represented in terms of
a novel observer-object relative reference system, named Local
Cardinal System, defined upon the local neighbourhood
of objects on the table. We present results of applying the
proposed methodology on five distinct scenarios involving
the construction of spatial patterns of coloured blocks
Statistical relational learning with soft quantifiers
Quantification in statistical relational learning (SRL) is either existential or universal, however humans might be more inclined to express knowledge using soft quantifiers, such as ``most'' and ``a few''. In this paper, we define the syntax and semantics of PSL^Q, a new SRL framework that supports reasoning with soft quantifiers, and present its most probable explanation (MPE) inference algorithm. To the best of our knowledge, PSL^Q is the first SRL framework that combines soft quantifiers with first-order logic rules for modelling uncertain relational data. Our experimental results for link prediction in social trust networks demonstrate that the use of soft quantifiers not only allows for a natural and intuitive formulation of domain knowledge, but also improves the accuracy of inferred results
QuickSpec: Guessing Formal Specifications using Testing
We present QuickSpec, a tool that automatically generates algebraic specifications for sets of pure functions. The tool is based on testing, rather than static analysis or theorem proving. The main challenge QuickSpec faces is to keep the number of generated equations to a minimum while maintaining completeness. We demonstrate how QuickSpec can improve one’s understanding of a program module by exploring the laws that are generated using two case studies: a heap library for Haskell and a fixed-point arithmetic library for Erlang
Data mining via ILP: The application of progol to a
As far as this author is aware, this is the first paper to describe the application of Progol to enantioseparations. A scheme is proposed for data mining a relational database of published enantioseparations using Progol. The application of the scheme is described and a preliminary assessment of the usefulness of the resulting generalisations is made using their accuracy, size, ease of interpretation and chemical justification
Fast relational learning using bottom clause propositionalization with artificial neural networks
Relational learning can be described as the task of learning first-order logic rules from examples. It has enabled a number of new machine learning applications, e.g. graph mining and link analysis. Inductive Logic Programming (ILP) performs relational learning either directly by manipulating first-order rules or through propositionalization, which translates the relational task into an attribute-value learning task by representing subsets of relations as features. In this paper, we introduce a fast method and system for relational learning based on a novel propositionalization called Bottom Clause Propositionalization (BCP). Bottom clauses are boundaries in the hypothesis search space used by ILP systems Progol and Aleph. Bottom clauses carry semantic meaning and can be mapped directly onto numerical vectors, simplifying the feature extraction process. We have integrated BCP with a well-known neural-symbolic system, C-IL2P, to perform learning from numerical vectors. C-IL2P uses background knowledge in the form of propositional logic programs to build a neural network. The integrated system, which we call CILP++, handles first-order logic knowledge and is available for download from Sourceforge. We have evaluated CILP++ on seven ILP datasets, comparing results with Aleph and a well-known propositionalization method, RSD. The results show that CILP++ can achieve accuracy comparable to Aleph, while being generally faster, BCP achieved statistically significant improvement in accuracy in comparison with RSD when running with a neural network, but BCP and RSD perform similarly when running with C4.5. We have also extended CILP++ to include a statistical feature selection method, mRMR, with preliminary results indicating that a reduction of more than 90 % of features can be achieved with a small loss of accuracy
- …
