2,638 research outputs found
Categorization of interestingness measures for knowledge extraction
Finding interesting association rules is an important and active research
field in data mining. The algorithms of the Apriori family are based on two
rule extraction measures, support and confidence. Although these two measures
have the virtue of being algorithmically fast, they generate a prohibitive
number of rules most of which are redundant and irrelevant. It is therefore
necessary to use further measures which filter uninteresting rules. Many
synthesis studies were then realized on the interestingness measures according
to several points of view. Different reported studies have been carried out to
identify "good" properties of rule extraction measures and these properties
have been assessed on 61 measures. The purpose of this paper is twofold. First
to extend the number of the measures and properties to be studied, in addition
to the formalization of the properties proposed in the literature. Second, in
the light of this formal study, to categorize the studied measures. This paper
leads then to identify categories of measures in order to help the users to
efficiently select an appropriate measure by choosing one or more measure(s)
during the knowledge extraction process. The properties evaluation on the 61
measures has enabled us to identify 7 classes of measures, classes that we
obtained using two different clustering techniques.Comment: 34 pages, 4 figure
Towards a semantic and statistical selection of association rules
The increasing growth of databases raises an urgent need for more accurate
methods to better understand the stored data. In this scope, association rules
were extensively used for the analysis and the comprehension of huge amounts of
data. However, the number of generated rules is too large to be efficiently
analyzed and explored in any further process. Association rules selection is a
classical topic to address this issue, yet, new innovated approaches are
required in order to provide help to decision makers. Hence, many interesting-
ness measures have been defined to statistically evaluate and filter the
association rules. However, these measures present two major problems. On the
one hand, they do not allow eliminating irrelevant rules, on the other hand,
their abun- dance leads to the heterogeneity of the evaluation results which
leads to confusion in decision making. In this paper, we propose a two-winged
approach to select statistically in- teresting and semantically incomparable
rules. Our statis- tical selection helps discovering interesting association
rules without favoring or excluding any measure. The semantic comparability
helps to decide if the considered association rules are semantically related
i.e comparable. The outcomes of our experiments on real datasets show promising
results in terms of reduction in the number of rules
Social Information Processing in Social News Aggregation
The rise of the social media sites, such as blogs, wikis, Digg and Flickr
among others, underscores the transformation of the Web to a participatory
medium in which users are collaboratively creating, evaluating and distributing
information. The innovations introduced by social media has lead to a new
paradigm for interacting with information, what we call 'social information
processing'. In this paper, we study how social news aggregator Digg exploits
social information processing to solve the problems of document recommendation
and rating. First, we show, by tracking stories over time, that social networks
play an important role in document recommendation. The second contribution of
this paper consists of two mathematical models. The first model describes how
collaborative rating and promotion of stories emerges from the independent
decisions made by many users. The second model describes how a user's
influence, the number of promoted stories and the user's social network,
changes in time. We find qualitative agreement between predictions of the model
and user data gathered from Digg.Comment: Extended version of the paper submitted to IEEE Internet Computing's
special issue on Social Searc
Interestingness of traces in declarative process mining: The janus LTLPf Approach
Declarative process mining is the set of techniques aimed at extracting behavioural constraints from event logs. These constraints are inherently of a reactive nature, in that their activation restricts the occurrence of other activities. In this way, they are prone to the principle of ex falso quod libet: they can be satisfied even when not activated. As a consequence, constraints can be mined that are hardly interesting to users or even potentially misleading. In this paper, we build on the observation that users typically read and write temporal constraints as if-statements with an explicit indication of the activation condition. Our approach is called Janus, because it permits the specification and verification of reactive constraints that, upon activation, look forward into the future and backwards into the past of a trace. Reactive constraints are expressed using Linear-time Temporal Logic with Past on Finite Traces (LTLp f). To mine them out of event logs, we devise a time bi-directional valuation technique based on triplets of automata operating in an on-line fashion. Our solution proves efficient, being at most quadratic w.r.t. trace length, and effective in recognising interestingness of discovered constraints
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful
patterns in transaction databases. Many different measures of interestingness
have been proposed for association rules. However, these measures fail to take
the probabilistic properties of the mined data into account. In this paper, we
start with presenting a simple probabilistic framework for transaction data
which can be used to simulate transaction data when no associations are
present. We use such data and a real-world database from a grocery outlet to
explore the behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically influenced
by the frequency of the items in the left hand side of rules and that lift
performs poorly to filter random noise in transaction data. Based on the
probabilistic framework we develop two new interest measures, hyper-lift and
hyper-confidence, which can be used to filter or order mined association rules.
The new measures show significantly better performance than lift for
applications where spurious rules are problematic
- …