833 research outputs found
Towards a semantic and statistical selection of association rules
The increasing growth of databases raises an urgent need for more accurate
methods to better understand the stored data. In this scope, association rules
were extensively used for the analysis and the comprehension of huge amounts of
data. However, the number of generated rules is too large to be efficiently
analyzed and explored in any further process. Association rules selection is a
classical topic to address this issue, yet, new innovated approaches are
required in order to provide help to decision makers. Hence, many interesting-
ness measures have been defined to statistically evaluate and filter the
association rules. However, these measures present two major problems. On the
one hand, they do not allow eliminating irrelevant rules, on the other hand,
their abun- dance leads to the heterogeneity of the evaluation results which
leads to confusion in decision making. In this paper, we propose a two-winged
approach to select statistically in- teresting and semantically incomparable
rules. Our statis- tical selection helps discovering interesting association
rules without favoring or excluding any measure. The semantic comparability
helps to decide if the considered association rules are semantically related
i.e comparable. The outcomes of our experiments on real datasets show promising
results in terms of reduction in the number of rules
Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets
Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base
Knowledge representation of large medical data using XML
SOMA uses longitudinal data collected from the Ophthalmology Clinic of the Royal Liverpool University Hospital. Using trend mining (an extension of association rule mining) SOMA links attributes from the data. However the large volume of information at the output makes them difficult to be explored by experts. This paper presents the extension of the SOMA framework which aims to improve the post-processing of the results from experts using a visualisation tool which parse and visualizes the results, which are stored into XML structured files
Experiences in building a tool for navigating association rule result sets
Practical knowledge discovery is an iterative process.
First, the experiences gained from one mining run
are used to inform the parameter setting and the
dataset and attribute selection for subsequent runs.
Second, additional data, either incremental additions
to existing datasets or the inclusion of additional attributes
means that the mining process is reinvoked,
perhaps numerous times. Reducing the number of
iterations, improving the accuracy of parameter setting
and making the results of the mining run more
clearly understandable can thus significantly speed up
the discovery process.
In this paper we discuss our experiences in this
area and present a system that helps the user to
navigate through association rule result sets in a
way that makes it easier to find useful results from a
large result set. We present several techniques that
experience has shown us to be useful. The prototype
system – IRSetNav – is discussed, which has
capabilities in redundant rule reduction, subjective
interestingness evaluation, item and itemset pruning,
related information searching, text-based itemset
and rule visualisation, hierarchy based searching
and tracking changes between data sets using a
knowledge base. Techniques also discussed in the
paper, but not yet accommodated into IRSetNav,
include input schema selection, longitudinal ruleset
analysis and graphical visualisation techniques.Adelaide, S
Interactive visual exploration of association rules with rule-focusing methodology
International audienceOn account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm
Modeling interestingness of streaming association rules as a benefit-maximizing classification problem
Cataloged from PDF version of article.In a typical application of association rule learning from market basket data, a set of transactions for a fixed period of time is used as input to rule learning algorithms. For example, the well-known Apriori algorithm can be applied to learn a set of association rules from such a transaction set. However, learning association rules from a set of transactions is not a one time only process. For example, a market manager may perform the association rule learning process once every month over the set of transactions collected through the last month. For this reason, we will consider the problem where transaction sets are input to the system as a stream of packages. The sets of transactions may come in varying sizes and in varying periods. Once a set of transactions arrive, the association rule learning algorithm is executed on the last set of transactions, resulting in new association rules. Therefore, the set of association rules learned will accumulate and increase in number over time, making the mining of interesting ones out of this enlarging set of association rules impractical for human experts. We refer to this sequence of rules as "association rule set stream" or "streaming association rules" and the main motivation behind this research is to develop a technique to overcome the interesting rule selection problem. A successful association rule mining system should select and present only the interesting rules to the domain experts. However, definition of interestingness of association rules on a given domain usually differs from one expert to another and also over time for a given expert. This paper proposes a post-processing method to learn a subjective model for the interestingness concept description of the streaming association rules. The uniqueness of the proposed method is its ability to formulate the interestingness issue of association rules as a benefit-maximizing classification problem and obtain a different interestingness model for each user. In this new classification scheme, the determining features are the selective objective interestingness factors related to the interestingness of the association rules, and the target feature is the interestingness label of those rules. The proposed method works incrementally and employs user interactivity at a certain level. It is evaluated on a real market dataset. The results show that the model can successfully select the interesting ones. (C) 2008 Elsevier B.V. All rights reserved
- …