9 research outputs found
Experiences in building a tool for navigating association rule result sets
Practical knowledge discovery is an iterative process.
First, the experiences gained from one mining run
are used to inform the parameter setting and the
dataset and attribute selection for subsequent runs.
Second, additional data, either incremental additions
to existing datasets or the inclusion of additional attributes
means that the mining process is reinvoked,
perhaps numerous times. Reducing the number of
iterations, improving the accuracy of parameter setting
and making the results of the mining run more
clearly understandable can thus significantly speed up
the discovery process.
In this paper we discuss our experiences in this
area and present a system that helps the user to
navigate through association rule result sets in a
way that makes it easier to find useful results from a
large result set. We present several techniques that
experience has shown us to be useful. The prototype
system – IRSetNav – is discussed, which has
capabilities in redundant rule reduction, subjective
interestingness evaluation, item and itemset pruning,
related information searching, text-based itemset
and rule visualisation, hierarchy based searching
and tracking changes between data sets using a
knowledge base. Techniques also discussed in the
paper, but not yet accommodated into IRSetNav,
include input schema selection, longitudinal ruleset
analysis and graphical visualisation techniques.Adelaide, S
Closed-set-based discovery of representative association rules revisited
The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the “essential” or “representative” rules. We revisit the algorithm given by Kryszkiewicz (Int. Symp. Intelligent Data Analysis 2001, Springer-Verlag LNCS 2189, 350–359) for mining representative rules. We show that its output is sometimes incomplete, due to an oversight in its mathematical validation, and we propose an alternative complete generator that works within only slightly larger running times.Postprint (author’s final draft
Closed-set-based discovery of representative association rules
The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the “essential” or “representative” rules. A previously known algorithm for mining representative rules relies on an incorrect mathematical claim, and can be seen to miss part of its intended output; in previous work, two of the authors of the present paper have offered a complete but, often, somewhat slower alternative. Here, we extend this alternative to the case of closure-based redundancy. The empirical validation shows that, in this way, we can improve on the original time efficiency, without sacrificing completeness.Peer ReviewedPostprint (author's final draft
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Horn axiomatizations for sequential data
AbstractWe propose a notion of deterministic association rules for ordered data. We prove that our proposed rules can be formally justified by a purely logical characterization, namely, a natural notion of empirical Horn approximation for ordered data which involves background Horn conditions; these ensure the consistency of the propositional theory obtained with the ordered context. The whole framework resorts to concept lattice models from Formal Concept Analysis, but adapted to ordered contexts. We also discuss a general method to mine these rules that can be easily incorporated into any algorithm for mining closed sequences, of which there are already some in the literature
A COMPREHENSIVE GEOSPATIAL KNOWLEDGE DISCOVERY FRAMEWORK FOR SPATIAL ASSOCIATION RULE MINING
Continuous advances in modern data collection techniques help spatial scientists gain access to massive and high-resolution spatial and spatio-temporal data. Thus there is an urgent need to develop effective and efficient methods seeking to find unknown and useful information embedded in big-data datasets of unprecedentedly large size (e.g., millions of observations), high dimensionality (e.g., hundreds of variables), and complexity (e.g., heterogeneous data sources, space–time dynamics, multivariate connections, explicit and implicit spatial relations and interactions). Responding to this line of development, this research focuses on the utilization of the association rule (AR) mining technique for a geospatial knowledge discovery process.
Prior attempts have sidestepped the complexity of the spatial dependence structure embedded in the studied phenomenon. Thus, adopting association rule mining in spatial analysis is rather problematic. Interestingly, a very similar predicament afflicts spatial regression analysis with a spatial weight matrix that would be assigned a priori, without validation on the specific domain of application. Besides, a dependable geospatial knowledge discovery process necessitates algorithms supporting automatic and robust but accurate procedures for the evaluation of mined results. Surprisingly, this has received little attention in the context of spatial association rule mining.
To remedy the existing deficiencies mentioned above, the foremost goal for this research is to construct a comprehensive geospatial knowledge discovery framework using spatial association rule mining for the detection of spatial patterns embedded in geospatial databases and to demonstrate its application within the domain of crime analysis. It is the first attempt at delivering a complete geo-spatial knowledge discovery framework using spatial association rule mining