652 research outputs found
Categorization of interestingness measures for knowledge extraction
Finding interesting association rules is an important and active research
field in data mining. The algorithms of the Apriori family are based on two
rule extraction measures, support and confidence. Although these two measures
have the virtue of being algorithmically fast, they generate a prohibitive
number of rules most of which are redundant and irrelevant. It is therefore
necessary to use further measures which filter uninteresting rules. Many
synthesis studies were then realized on the interestingness measures according
to several points of view. Different reported studies have been carried out to
identify "good" properties of rule extraction measures and these properties
have been assessed on 61 measures. The purpose of this paper is twofold. First
to extend the number of the measures and properties to be studied, in addition
to the formalization of the properties proposed in the literature. Second, in
the light of this formal study, to categorize the studied measures. This paper
leads then to identify categories of measures in order to help the users to
efficiently select an appropriate measure by choosing one or more measure(s)
during the knowledge extraction process. The properties evaluation on the 61
measures has enabled us to identify 7 classes of measures, classes that we
obtained using two different clustering techniques.Comment: 34 pages, 4 figure
On Objective Measures of Rule Surprisingness
Most of the literature argues that surprisingness is an inherently subjective aspect of the discovered knowledge, which cannot be measured in objective terms. This paper departs from this view, and it has a twofold goal: (1) showing that it is indeed possible to define objective (rather than subjective) measures of discovered rule surprisingness; (2) proposing new ideas and methods for defining objective rule surprisingness measures
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Understanding customer buying patterns is of great interest to the retail
industry and has shown to benefit a wide variety of goals ranging from managing
stocks to implementing loyalty programs. Association rule mining is a common
technique for extracting correlations such as "people in the South of France
buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour
bread." Unfortunately, sifting through a high number of buying patterns is not
useful in practice, because of the predominance of popular products in the top
rules. As a result, a number of "interestingness" measures (over 30) have been
proposed to rank rules. However, there is no agreement on which measures are
more appropriate for retail data. Moreover, since pattern mining algorithms
output thousands of association rules for each product, the ability for an
analyst to rely on ranking measures to identify the most interesting ones is
crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a
framework that provides analysts with the ability to compare the outcome of
interestingness measures applied to buying patterns in the retail industry. We
report on how we used CAPA to compare 34 measures applied to over 1,800 stores
of Intermarch\'e, one of the largest food retailers in France
Interactive visual exploration of association rules with rule-focusing methodology
International audienceOn account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm
Quantitative Redundancy in Partial Implications
We survey the different properties of an intuitive notion of redundancy, as a
function of the precise semantics given to the notion of partial implication.
The final version of this survey will appear in the Proceedings of the Int.
Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201
Measuring Interestingness – Perspectives on Anomaly Detection
We live in a data deluge. Our ability to gather, distribute, and store information has grown immensely over the past two decades. With this overabundance of data, the core knowledge discovery problem is no longer in the gathering of this data, but rather in the retrieving of relevant data efficiently. While the most common approach is to use rule interestingness to filter results of the association rule generation process, study of literature suggests that interestingness is difficult to define quantitatively and is best summarized as, “a record or pattern is interesting if it suggests a change in an established model.” In this paper we elaborate on the term interestingness, and the surrounding taxonomy of interestingness measures, anomalies, novelty and surprisingness. We review and summarize the current state of literature surrounding interestingness and associated approaches. Keywords: Interestingness, anomaly detection, rare-class mining, Interestingness measures, outliers, surprisingness, novelt
- …