108,562 research outputs found

    Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining

    Full text link
    We present theoretical analysis and a suite of tests and procedures for addressing a broad class of redundant and misleading association rules we call \emph{specious rules}. Specious dependencies, also known as \emph{spurious}, \emph{apparent}, or \emph{illusory associations}, refer to a well-known phenomenon where marginal dependencies are merely products of interactions with other variables and disappear when conditioned on those variables. The most extreme example is Yule-Simpson's paradox where two variables present positive dependence in the marginal contingency table but negative in all partial tables defined by different levels of a confounding factor. It is accepted wisdom that in data of any nontrivial dimensionality it is infeasible to control for all of the exponentially many possible confounds of this nature. In this paper, we consider the problem of specious dependencies in the context of statistical association rule mining. We define specious rules and show they offer a unifying framework which covers many types of previously proposed redundant or misleading association rules. After theoretical analysis, we introduce practical algorithms for detecting and pruning out specious association rules efficiently under many key goodness measures, including mutual information and exact hypergeometric probabilities. We demonstrate that the procedure greatly reduces the number of associations discovered, providing an elegant and effective solution to the problem of association mining discovering large numbers of misleading and redundant rules.Comment: Note: This is a corrected version of the paper published in SDM'17. In the equation on page 4, the range of the sum has been correcte

    arules - A Computational Environment for Mining Association Rules and Frequent Item Sets

    Get PDF
    Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.

    An agile business process and practice meta-model

    Get PDF
    Business Process Management (BPM) encompasses the discovery, modelling, monitoring, analysis and improvement of business processes. Limitations of traditional BPM approaches in addressing changes in business requirements have resulted in a number of agile BPM approaches that seek to accelerate the redesign of business process models. Meta-models are a key BPM feature that reduce the ambiguity of business process models. This paper describes a meta-model supporting the agile version of the Business Process and Practice Alignment Methodology (BPPAM) for business process improvement, which captures process information from actual work practices. The ability of the meta-model to achieve business process agility is discussed and compared with other agile meta-models, based on definitions of business process flexibility and agility found in the literature. (C) 2017 The Authors. Published by Elsevier B.V

    Ad hoc categories

    Get PDF
    People construct ad hoc categories to achieve goals. For example, constructing the category of “things to sell at a garage sale” can be instrumental to achieving the goal of selling unwanted possessions. These categories differ from common categories (e.g., “fruit,” “furniture”) in that ad hoc categories violate the correlational structure of the environment and are not well established in memory. Regarding the latter property, the category concepts, concept-to-instance associations, and instance-to-concept associations structuring ad hoc categories are shown to be much less established in memory than those of common categories. Regardless of these differences, however, ad hoc categories possess graded structures (i.e., typicality gradients) as salient as those structuring common categories. This appears to be the result of a similarity comparison process that imposes graded structure on any category regardless of type

    Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

    Full text link
    Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarch\'e, one of the largest food retailers in France
    • 

    corecore