108,562 research outputs found
Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining
We present theoretical analysis and a suite of tests and procedures for
addressing a broad class of redundant and misleading association rules we call
\emph{specious rules}. Specious dependencies, also known as \emph{spurious},
\emph{apparent}, or \emph{illusory associations}, refer to a well-known
phenomenon where marginal dependencies are merely products of interactions with
other variables and disappear when conditioned on those variables.
The most extreme example is Yule-Simpson's paradox where two variables
present positive dependence in the marginal contingency table but negative in
all partial tables defined by different levels of a confounding factor. It is
accepted wisdom that in data of any nontrivial dimensionality it is infeasible
to control for all of the exponentially many possible confounds of this nature.
In this paper, we consider the problem of specious dependencies in the context
of statistical association rule mining. We define specious rules and show they
offer a unifying framework which covers many types of previously proposed
redundant or misleading association rules. After theoretical analysis, we
introduce practical algorithms for detecting and pruning out specious
association rules efficiently under many key goodness measures, including
mutual information and exact hypergeometric probabilities. We demonstrate that
the procedure greatly reduces the number of associations discovered, providing
an elegant and effective solution to the problem of association mining
discovering large numbers of misleading and redundant rules.Comment: Note: This is a corrected version of the paper published in SDM'17.
In the equation on page 4, the range of the sum has been correcte
arules - A Computational Environment for Mining Association Rules and Frequent Item Sets
Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
An agile business process and practice meta-model
Business Process Management (BPM) encompasses the discovery, modelling, monitoring, analysis and improvement of business processes. Limitations of traditional BPM approaches in addressing changes in business requirements have resulted in a number of agile BPM approaches that seek to accelerate the redesign of business process models. Meta-models are a key BPM feature that reduce the ambiguity of business process models. This paper describes a meta-model supporting the agile version of the Business Process and Practice Alignment Methodology (BPPAM) for business process improvement, which captures process information from actual work practices. The ability of the meta-model to achieve business process agility is discussed and compared with other agile meta-models, based on definitions of business process flexibility and agility found in the literature. (C) 2017 The Authors. Published by Elsevier B.V
Ad hoc categories
People construct ad hoc categories to achieve goals. For example, constructing the category of âthings to sell at a garage saleâ can be instrumental to achieving the goal of selling unwanted possessions. These categories differ from common categories (e.g., âfruit,â âfurnitureâ) in that ad hoc categories violate the correlational structure of the environment and are not well established in memory. Regarding the latter property, the category concepts, concept-to-instance associations, and instance-to-concept associations structuring ad hoc categories are shown to be much less established in memory than those of common categories. Regardless of these differences, however, ad hoc categories possess graded structures (i.e., typicality gradients) as salient as those structuring common categories. This appears to be the result of a similarity comparison process that imposes graded structure on any category regardless of type
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Understanding customer buying patterns is of great interest to the retail
industry and has shown to benefit a wide variety of goals ranging from managing
stocks to implementing loyalty programs. Association rule mining is a common
technique for extracting correlations such as "people in the South of France
buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour
bread." Unfortunately, sifting through a high number of buying patterns is not
useful in practice, because of the predominance of popular products in the top
rules. As a result, a number of "interestingness" measures (over 30) have been
proposed to rank rules. However, there is no agreement on which measures are
more appropriate for retail data. Moreover, since pattern mining algorithms
output thousands of association rules for each product, the ability for an
analyst to rely on ranking measures to identify the most interesting ones is
crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a
framework that provides analysts with the ability to compare the outcome of
interestingness measures applied to buying patterns in the retail industry. We
report on how we used CAPA to compare 34 measures applied to over 1,800 stores
of Intermarch\'e, one of the largest food retailers in France
- âŠ