5 research outputs found
Object-oriented data mining
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Better Rulesets by Removing Redundant Specialisations and Generalisations in Association Rule Mining
Association rule mining is a fundamental task in many data mining and analysis applications, both for knowledge extraction and as part of other processes (for example, building associative classifiers). It is well known that the number of associations identified by many association rule mining algorithms can be so large as to present a barrier to their interpretability and practical use. A typical solution to this problem involves removing redundant rules. This paper proposes a novel definition of redundancy, which is used to identify only the most interesting associations. Compared to existing redundancy based approaches, our method is both more robust to noise, and produces fewer overall rules for a given data (improving clarity). A rule can be considered redundant if the knowledge it describes is already contained in other rules. Given an association rule, most existing approaches consider rules to be redundant if they add additional variables without increasing quality according to some measure of interestingness. We claim that complex interactions between variables can confound many interestingness measures. This can lead to existing approaches being overly aggressive in removing redundant associations. Most existing approaches also fail to take into account situations where more general rules (those with fewer attributes) can be considered redundant with respect to their specialisations. We examine this problem and provide concrete examples of such errors using artificial data. An alternate definition of redundancy that addresses these issues is proposed. Our approach is shown to identify interesting associations missed by comparable methods on multiple real and synthetic data. When combined with the removal of redundant generalisations, our approach is often able to generate smaller overall rule sets, while leaving average rule quality unaffected or slightly improved
Combined decision procedures for nonlinear arithmetics, real and complex
We describe contributions to algorithmic proof techniques for deciding the satisfiability
of boolean combinations of many-variable nonlinear polynomial equations and
inequalities over the real and complex numbers.
In the first half, we present an abstract theory of Grobner basis construction algorithms
for algebraically closed fields of characteristic zero and use it to introduce
and prove the correctness of Grobner basis methods tailored to the needs of modern
satisfiability modulo theories (SMT) solvers. In the process, we use the technique of
proof orders to derive a generalisation of S-polynomial superfluousness in terms of
transfinite induction along an ordinal parameterised by a monomial order. We use this
generalisation to prove the abstract (âstrategy-independentâ) admissibility of a number
of superfluous S-polynomial criteria important for efficient basis construction. Finally,
we consider local notions of proof minimality for weak Nullstellensatz proofs and give
ideal-theoretic methods for computing complex âunsatisfiable coresâ which contribute
to efficient SMT solving in the context of nonlinear complex arithmetic.
In the second half, we consider the problem of effectively combining a heterogeneous
collection of decision techniques for fragments of the existential theory of real
closed fields. We propose and investigate a number of novel combined decision methods
and implement them in our proof tool RAHD (Real Algebra in High Dimensions).
We build a hierarchy of increasingly powerful combined decision methods, culminating
in a generalisation of partial cylindrical algebraic decomposition (CAD) which we
call Abstract Partial CAD. This generalisation incorporates the use of arbitrary sound
but possibly incomplete proof procedures for the existential theory of real closed fields
as first-class functional parameters for âshort-circuitingâ expensive computations during
the lifting phase of CAD. Identifying these proof procedure parameters formally
with RAHD proof strategies, we implement the method in RAHD for the case of
full-dimensional cell decompositions and investigate its efficacy with respect to the
Brown-McCallum projection operator.
We end with some wishes for the future
Studies related to the process of program development
The submitted work consists of a collection of publications arising from research carried out at Rhodes University (1970-1980) and at Heriot-Watt University (1980-1992). The theme of this research is the process of program development, i.e. the process of creating a computer program to solve some particular problem. The papers presented cover a number of different topics which relate to this process, viz. (a) Programming methodology programming. (b) Properties of programming languages. aspects of structured. (c) Formal specification of programming languages. (d) Compiler techniques. (e) Declarative programming languages. (f) Program development aids. (g) Automatic program generation. (h) Databases. (i) Algorithms and applications