96 research outputs found
Statistical strategies for pruning all the uninteresting association rules
We propose a general framework to describe formally the
problem of capturing the intensity of implication for
association rules through statistical metrics.
In this framework we present properties that influence the
interestingness of a rule, analyze the conditions that
lead a measure to perform a perfect prune at a time,
and define a final proper order to sort the surviving
rules. We will discuss why none of the currently employed
measures can capture objective interestingness, and
just the combination of some of them, in a multi-step fashion,
can be reliable. In contrast, we propose a new simple modification
of the Pearson coefficient that will meet all the necessary
requirements. We statistically infer the convenient cut-off
threshold for this new metric by empirically describing its
distribution function through simulation. Final experiments
serve to show the ability of our proposal.Postprint (published version
Analysis of monotonicity properties of some rule interestingness measures
One of the crucial problems in the field of knowledge discovery is development of good interestingness measures for evaluation of the discovered patterns. In this paper, we consider quantitative, objective interestingness measures for "if..., then... " association rules. We focus on three popular interestingness measures, namely rule interest function of Piatetsky-Shapiro, gain measure of Fukuda et al., and dependency factor used by Pawlak. We verify whether they satisfy the valuable property M of monotonic dependency on the number of objects satisfying or not the premise or the conclusion of a rule, and property of hypothesis symmetry (HS). Moreover, analytically and through experiments we show an interesting relationship between those measures and two other commonly used measures of rule support and anti-support
Post-processing of association rules.
In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful
patterns in transaction databases. Many different measures of interestingness
have been proposed for association rules. However, these measures fail to take
the probabilistic properties of the mined data into account. In this paper, we
start with presenting a simple probabilistic framework for transaction data
which can be used to simulate transaction data when no associations are
present. We use such data and a real-world database from a grocery outlet to
explore the behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically influenced
by the frequency of the items in the left hand side of rules and that lift
performs poorly to filter random noise in transaction data. Based on the
probabilistic framework we develop two new interest measures, hyper-lift and
hyper-confidence, which can be used to filter or order mined association rules.
The new measures show significantly better performance than lift for
applications where spurious rules are problematic
- …