7,887 research outputs found
An information theoretic approach to rule induction from databases
The knowledge acquisition bottleneck in obtaining
rules directly from an expert is well known. Hence, the problem
of automated rule acquisition from data is a well-motivated one,
particularly for domains where a database of sample data exists.
In this paper we introduce a novel algorithm for the induction
of rules from examples. The algorithm is novel in the sense
that it not only learns rules for a given concept (classification),
but it simultaneously learns rules relating multiple concepts.
This type of learning, known as generalized rule induction is
considerably more general than existing algorithms which tend
to be classification oriented. Initially we focus on the problem of
determining a quantitative, well-defined rule preference measure.
In particular, we propose a quantity called the J-measure as
an information theoretic alternative to existing approaches. The
J-measure quantifies the information content of a rule or a
hypothesis. We will outline the information theoretic origins
of this measure and examine its plausibility as a hypothesis
preference measure. We then define the ITRULE algorithm which
uses the newly proposed measure to learn a set of optimal rules
from a set of data samples, and we conclude the paper with an
analysis of experimental results on real-world data
An information theoretic approach to rule induction from databases
The knowledge acquisition bottleneck in obtaining
rules directly from an expert is well known. Hence, the problem
of automated rule acquisition from data is a well-motivated one,
particularly for domains where a database of sample data exists.
In this paper we introduce a novel algorithm for the induction
of rules from examples. The algorithm is novel in the sense
that it not only learns rules for a given concept (classification),
but it simultaneously learns rules relating multiple concepts.
This type of learning, known as generalized rule induction is
considerably more general than existing algorithms which tend
to be classification oriented. Initially we focus on the problem of
determining a quantitative, well-defined rule preference measure.
In particular, we propose a quantity called the J-measure as
an information theoretic alternative to existing approaches. The
J-measure quantifies the information content of a rule or a
hypothesis. We will outline the information theoretic origins
of this measure and examine its plausibility as a hypothesis
preference measure. We then define the ITRULE algorithm which
uses the newly proposed measure to learn a set of optimal rules
from a set of data samples, and we conclude the paper with an
analysis of experimental results on real-world data
Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiļ¬cation rule induction, parallelisation of classiļ¬cation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiļ¬cation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiļ¬cation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiļ¬cation rule induction, parallelisation of classiļ¬cation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiļ¬cation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiļ¬cation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach
Programming in logic without logic programming
In previous work, we proposed a logic-based framework in which computation is
the execution of actions in an attempt to make reactive rules of the form if
antecedent then consequent true in a canonical model of a logic program
determined by an initial state, sequence of events, and the resulting sequence
of subsequent states. In this model-theoretic semantics, reactive rules are the
driving force, and logic programs play only a supporting role.
In the canonical model, states, actions and other events are represented with
timestamps. But in the operational semantics, for the sake of efficiency,
timestamps are omitted and only the current state is maintained. State
transitions are performed reactively by executing actions to make the
consequents of rules true whenever the antecedents become true. This
operational semantics is sound, but incomplete. It cannot make reactive rules
true by preventing their antecedents from becoming true, or by proactively
making their consequents true before their antecedents become true.
In this paper, we characterize the notion of reactive model, and prove that
the operational semantics can generate all and only such models. In order to
focus on the main issues, we omit the logic programming component of the
framework.Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP
Random Prism: An Alternative to Random Forests.
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prismās classification accuracy by reducing overfitting
Super Logic Programs
The Autoepistemic Logic of Knowledge and Belief (AELB) is a powerful
nonmonotic formalism introduced by Teodor Przymusinski in 1994. In this paper,
we specialize it to a class of theories called `super logic programs'. We argue
that these programs form a natural generalization of standard logic programs.
In particular, they allow disjunctions and default negation of arbibrary
positive objective formulas.
Our main results are two new and powerful characterizations of the static
semant ics of these programs, one syntactic, and one model-theoretic. The
syntactic fixed point characterization is much simpler than the fixed point
construction of the static semantics for arbitrary AELB theories. The
model-theoretic characterization via Kripke models allows one to construct
finite representations of the inherently infinite static expansions.
Both characterizations can be used as the basis of algorithms for query
answering under the static semantics. We describe a query-answering interpreter
for super programs which we developed based on the model-theoretic
characterization and which is available on the web.Comment: 47 pages, revised version of the paper submitted 10/200
- ā¦