Search CORE

2,232 research outputs found

Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules

Author: A Freitas
C C Aggarwal P S Y
D Gunopulos R Khardon, H Mannila, S Sal
Georg Gottlob
J-F Boulicaut A Bykowski, C Rigotti: Fr
J-L Guigues V Duquenne:
José Balcázar
R Agrawal T Imielinski, A Swam
R Dechter J Pearl:
R Khardon D Roth
T Calders B Goethals:
T Eiter G Gottlob
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2009
Field of study

Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

Mining Frequent Item sets in Data Streams

Author: Dass Rajanish
Publication venue
Publication date
Field of study

Research Papers in Economics

Mining frequent itemsets a perspective from operations research

Author: Kosters W.A.
Pijls W.H.L.M.
Publication venue
Publication date
Field of study

Many papers on frequent itemsets have been published. Besides somecontests in this field were held. In the majority of the papers the focus ison speed. Ad hoc algorithms and datastructures were introduced. Inthis paper we put most of the algorithms in one framework, usingclassical Operations Research paradigms such as backtracking, depth-first andbreadth-first search, and branch-and-bound. Moreover we presentexperimental results where the different algorithms are implementedunder similar designs.data mining;operation research;Frequent itemsets

Research Papers in Economics

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

Author: Divitiis G
Guagnelli M
Palombi F
Petronzio R
Tantalo N
Publication venue
Publication date: 01/01/2003
Field of study

Maximal frequent patterns superset checking plays an important role in the efficient mining of complete Maximal Frequent Itemsets (MFI) and maximal search space pruning. In this paper we present a new indexing approach, FastLMFI for local maximal frequent patterns (itemset) propagation and maximal patterns superset checking. Experimental results on different sparse and dense datasets show that our work is better than the previous well known progressive focusing technique. We have also integrated our superset checking approach with an existing state of the art maximal itemsets algorithm Mafia, and compare our results with current best maximal itemsets algorithms afopt-max and FP (zhu)-max. Our results outperform afopt-max and FP (zhu)-max on dense (chess and mushroom) datasets on almost all support thresholds, which shows the effectiveness of our approach.Comment: 8 Pages, In the proceedings of 4th ACS/IEEE International Conference on Computer Systems and Applications 2006, March 8, 2006, Dubai/Sharjah, UAE, 2006, Page(s) 452-45

arXiv.org e-Print Archive

Crossref

ART

CERN Document Server

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

Author: Baig Abdul Rauf
Bashir Shariq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

arXiv.org e-Print Archive

Crossref

arules - A Computational Environment for Mining Association Rules and Frequent Item Sets

Author: Bettina Grün
Kurt Hornik
Michael Hahsler
Publication venue
Publication date
Field of study

Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.

Research Papers in Economics

A Tight Upper Bound on the Number of Candidate Patterns

Author: Bussche Jan Van den
Geerts Floris
Goethals Bart
Publication venue
Publication date: 01/01/2001
Field of study

In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans

arXiv.org e-Print Archive

CiteSeerX