2,122 research outputs found
Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures
Formal Concept Analysis "FCA" is a data analysis method which enables to
discover hidden knowledge existing in data. A kind of hidden knowledge
extracted from data is association rules. Different quality measures were
reported in the literature to extract only relevant association rules. Given a
dataset, the choice of a good quality measure remains a challenging task for a
user. Given a quality measures evaluation matrix according to semantic
properties, this paper describes how FCA can highlight quality measures with
similar behavior in order to help the user during his choice. The aim of this
article is the discovery of Interestingness Measures "IM" clusters, able to
validate those found due to the hierarchical and partitioning clustering
methods "AHC" and "k-means". Then, based on the theoretical study of sixty one
interestingness measures according to nineteen properties, proposed in a recent
study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure
Categorization of interestingness measures for knowledge extraction
Finding interesting association rules is an important and active research
field in data mining. The algorithms of the Apriori family are based on two
rule extraction measures, support and confidence. Although these two measures
have the virtue of being algorithmically fast, they generate a prohibitive
number of rules most of which are redundant and irrelevant. It is therefore
necessary to use further measures which filter uninteresting rules. Many
synthesis studies were then realized on the interestingness measures according
to several points of view. Different reported studies have been carried out to
identify "good" properties of rule extraction measures and these properties
have been assessed on 61 measures. The purpose of this paper is twofold. First
to extend the number of the measures and properties to be studied, in addition
to the formalization of the properties proposed in the literature. Second, in
the light of this formal study, to categorize the studied measures. This paper
leads then to identify categories of measures in order to help the users to
efficiently select an appropriate measure by choosing one or more measure(s)
during the knowledge extraction process. The properties evaluation on the 61
measures has enabled us to identify 7 classes of measures, classes that we
obtained using two different clustering techniques.Comment: 34 pages, 4 figure
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Understanding customer buying patterns is of great interest to the retail
industry and has shown to benefit a wide variety of goals ranging from managing
stocks to implementing loyalty programs. Association rule mining is a common
technique for extracting correlations such as "people in the South of France
buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour
bread." Unfortunately, sifting through a high number of buying patterns is not
useful in practice, because of the predominance of popular products in the top
rules. As a result, a number of "interestingness" measures (over 30) have been
proposed to rank rules. However, there is no agreement on which measures are
more appropriate for retail data. Moreover, since pattern mining algorithms
output thousands of association rules for each product, the ability for an
analyst to rely on ranking measures to identify the most interesting ones is
crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a
framework that provides analysts with the ability to compare the outcome of
interestingness measures applied to buying patterns in the retail industry. We
report on how we used CAPA to compare 34 measures applied to over 1,800 stores
of Intermarch\'e, one of the largest food retailers in France
Towards a semantic and statistical selection of association rules
The increasing growth of databases raises an urgent need for more accurate
methods to better understand the stored data. In this scope, association rules
were extensively used for the analysis and the comprehension of huge amounts of
data. However, the number of generated rules is too large to be efficiently
analyzed and explored in any further process. Association rules selection is a
classical topic to address this issue, yet, new innovated approaches are
required in order to provide help to decision makers. Hence, many interesting-
ness measures have been defined to statistically evaluate and filter the
association rules. However, these measures present two major problems. On the
one hand, they do not allow eliminating irrelevant rules, on the other hand,
their abun- dance leads to the heterogeneity of the evaluation results which
leads to confusion in decision making. In this paper, we propose a two-winged
approach to select statistically in- teresting and semantically incomparable
rules. Our statis- tical selection helps discovering interesting association
rules without favoring or excluding any measure. The semantic comparability
helps to decide if the considered association rules are semantically related
i.e comparable. The outcomes of our experiments on real datasets show promising
results in terms of reduction in the number of rules
A New Algorithm for Exploratory Projection Pursuit
In this paper, we propose a new algorithm for exploratory projection pursuit.
The basis of the algorithm is the insight that previous approaches used fairly
narrow definitions of interestingness / non interestingness. We argue that
allowing these definitions to depend on the problem / data at hand is a more
natural approach in an exploratory technique. This also allows our technique
much greater applicability than the approaches extant in the literature.
Complementing this insight, we propose a class of projection indices based on
the spatial distribution function that can make use of such information.
Finally, with the help of real datasets, we demonstrate how a range of
multivariate exploratory tasks can be addressed with our algorithm. The
examples further demonstrate that the proposed indices are quite capable of
focussing on the interesting structure in the data, even when this structure is
otherwise hard to detect or arises from very subtle patterns.Comment: 29 pages, 8 figure
Guided Interaction Exploration in Artifact-centric Process Models
Artifact-centric process models aim to describe complex processes as a
collection of interacting artifacts. Recent development in process mining allow
for the discovery of such models. However, the focus is often on the
representation of the individual artifacts rather than their interactions.
Based on event data we can automatically discover composite state machines
representing artifact-centric processes. Moreover, we provide ways of
visualizing and quantifying interactions among different artifacts. For
example, we are able to highlight strongly correlated behaviours in different
artifacts. The approach has been fully implemented as a ProM plug-in; the CSM
Miner provides an interactive artifact-centric process discovery tool focussing
on interactions. The approach has been evaluated using real life data sets,
including the personal loan and overdraft process of a Dutch financial
institution.Comment: 10 pages, 4 figures, to be published in proceedings of the 19th IEEE
Conference on Business Informatics, CBI 201
- …