Search CORE

10,020 research outputs found

How to find frequent patterns?

Author: Koster W.A.
Pijls W.H.L.M.
Publication venue
Publication date
Field of study

An improved version of DF, the depth-first implementation of Apriori, is presented.Given a database of (e.g., supermarket) transactions, the DF algorithm builds a so-called trie that contains all frequent itemsets, i.e., all itemsets that are contained in at least `minsup' transactions with `minsup' a given threshold value.In the trie, there is a one-to-one correspondence between the paths and the frequent itemsets.The new version, called DF+, differs from DF in that its data structure representing the database is borrowed from the FP-growth algorithm. So it combines the compact FP-growth data structure with the efficient trie-building method in DF.

Research Papers in Economics

Efficient Incremental Breadth-Depth XML Event Mining

Author: Boussaïd Omar
Darmont Jérôme
Salem Rashed
Publication venue
Publication date: 01/01/2011
Field of study

Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules

arXiv.org e-Print Archive

Crossref

HAL

HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

Author: Baig Abdul Rauf
Bashir Shariq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/04/2009
Field of study

In this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan, 200

arXiv.org e-Print Archive

Crossref

A Tight Upper Bound on the Number of Candidate Patterns

Author: Bussche Jan Van den
Geerts Floris
Goethals Bart
Publication venue
Publication date: 01/01/2001
Field of study

In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans

arXiv.org e-Print Archive

CiteSeerX

How to find frequent patterns?

Author: Koster W.A.
Pijls W.H.L.M. (Wim)
Publication venue
Publication date: 01/01/2005
Field of study

An improved version of DF, the depth-first implementation of Apriori, is presented. Given a database of (e.g., supermarket) transactions, the DF algorithm builds a so-called trie that contains all frequent itemsets, i.e., all itemsets that are contained in at least `minsup' transactions with `minsup' a given threshold value. In the trie, there is a one-to-one correspondence between the paths and the frequent itemsets. The new version, called DF+, differs from DF in that its data structure representing the database is borrowed from the FP-growth algorithm. So it combines the compact FP-growth data structure with the efficient trie-building method in DF

CiteSeerX

EUR Research Repository

Erasmus University Digital Repository