652 research outputs found

    BruteSuppression: a size reduction method for Apriori rule sets

    Get PDF

    Categorization of interestingness measures for knowledge extraction

    Full text link
    Finding interesting association rules is an important and active research field in data mining. The algorithms of the Apriori family are based on two rule extraction measures, support and confidence. Although these two measures have the virtue of being algorithmically fast, they generate a prohibitive number of rules most of which are redundant and irrelevant. It is therefore necessary to use further measures which filter uninteresting rules. Many synthesis studies were then realized on the interestingness measures according to several points of view. Different reported studies have been carried out to identify "good" properties of rule extraction measures and these properties have been assessed on 61 measures. The purpose of this paper is twofold. First to extend the number of the measures and properties to be studied, in addition to the formalization of the properties proposed in the literature. Second, in the light of this formal study, to categorize the studied measures. This paper leads then to identify categories of measures in order to help the users to efficiently select an appropriate measure by choosing one or more measure(s) during the knowledge extraction process. The properties evaluation on the 61 measures has enabled us to identify 7 classes of measures, classes that we obtained using two different clustering techniques.Comment: 34 pages, 4 figure

    On Objective Measures of Rule Surprisingness

    Get PDF
    Most of the literature argues that surprisingness is an inherently subjective aspect of the discovered knowledge, which cannot be measured in objective terms. This paper departs from this view, and it has a twofold goal: (1) showing that it is indeed possible to define objective (rather than subjective) measures of discovered rule surprisingness; (2) proposing new ideas and methods for defining objective rule surprisingness measures

    Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

    Full text link
    Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarch\'e, one of the largest food retailers in France

    Interactive visual exploration of association rules with rule-focusing methodology

    Get PDF
    International audienceOn account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm

    Quantitative Redundancy in Partial Implications

    Get PDF
    We survey the different properties of an intuitive notion of redundancy, as a function of the precise semantics given to the notion of partial implication. The final version of this survey will appear in the Proceedings of the Int. Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201

    Measuring Interestingness – Perspectives on Anomaly Detection

    Get PDF
    We live in a data deluge. Our ability to gather, distribute, and store information has grown immensely over the past two decades. With this overabundance of data, the core knowledge discovery problem is no longer in the gathering of this data, but rather in the retrieving of relevant data efficiently. While the most common approach is to use rule interestingness to filter results of the association rule generation process, study of literature suggests that interestingness is difficult to define quantitatively and is best summarized as, “a record or pattern is interesting if it suggests a change in an established model.” In this paper we elaborate on the term interestingness, and the surrounding taxonomy of interestingness measures, anomalies, novelty and surprisingness. We review and summarize the current state of literature surrounding interestingness and associated approaches. Keywords: Interestingness, anomaly detection, rare-class mining, Interestingness measures, outliers, surprisingness, novelt
    • …
    corecore