2,232 research outputs found
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Mining frequent itemsets a perspective from operations research
Many papers on frequent itemsets have been published. Besides somecontests in this field were held. In the majority of the papers the focus ison speed. Ad hoc algorithms and datastructures were introduced. Inthis paper we put most of the algorithms in one framework, usingclassical Operations Research paradigms such as backtracking, depth-first andbreadth-first search, and branch-and-bound. Moreover we presentexperimental results where the different algorithms are implementedunder similar designs.data mining;operation research;Frequent itemsets
FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking
Maximal frequent patterns superset checking plays an important role in the
efficient mining of complete Maximal Frequent Itemsets (MFI) and maximal search
space pruning. In this paper we present a new indexing approach, FastLMFI for
local maximal frequent patterns (itemset) propagation and maximal patterns
superset checking. Experimental results on different sparse and dense datasets
show that our work is better than the previous well known progressive focusing
technique. We have also integrated our superset checking approach with an
existing state of the art maximal itemsets algorithm Mafia, and compare our
results with current best maximal itemsets algorithms afopt-max and FP
(zhu)-max. Our results outperform afopt-max and FP (zhu)-max on dense (chess
and mushroom) datasets on almost all support thresholds, which shows the
effectiveness of our approach.Comment: 8 Pages, In the proceedings of 4th ACS/IEEE International Conference
on Computer Systems and Applications 2006, March 8, 2006, Dubai/Sharjah, UAE,
2006, Page(s) 452-45
FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking
Maximal frequent patterns superset checking plays an important role in the
efficient mining of complete Maximal Frequent Itemsets (MFI) and maximal search
space pruning. In this paper we present a new indexing approach, FastLMFI for
local maximal frequent patterns (itemset) propagation and maximal patterns
superset checking. Experimental results on different sparse and dense datasets
show that our work is better than the previous well known progressive focusing
technique. We have also integrated our superset checking approach with an
existing state of the art maximal itemsets algorithm Mafia, and compare our
results with current best maximal itemsets algorithms afopt-max and FP
(zhu)-max. Our results outperform afopt-max and FP (zhu)-max on dense (chess
and mushroom) datasets on almost all support thresholds, which shows the
effectiveness of our approach.Comment: 8 Pages, In the proceedings of 4th ACS/IEEE International Conference
on Computer Systems and Applications 2006, March 8, 2006, Dubai/Sharjah, UAE,
2006, Page(s) 452-45
arules - A Computational Environment for Mining Association Rules and Frequent Item Sets
Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
A Tight Upper Bound on the Number of Candidate Patterns
In the context of mining for frequent patterns using the standard levelwise
algorithm, the following question arises: given the current level and the
current set of frequent patterns, what is the maximal number of candidate
patterns that can be generated on the next level? We answer this question by
providing a tight upper bound, derived from a combinatorial result from the
sixties by Kruskal and Katona. Our result is useful to reduce the number of
database scans
- …