63,775 research outputs found
A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Mining frequent itemsets is a popular method for finding associated items in
databases. For this method, support, the co-occurrence frequency of the items
which form an association, is used as the primary indicator of the
associations's significance. A single user-specified support threshold is used
to decided if associations should be further investigated. Support has some
known problems with rare items, favors shorter itemsets and sometimes produces
misleading associations.
In this paper we develop a novel model-based frequency constraint as an
alternative to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by applying a
simple stochastic mixture model (the NB model) which allows for transaction
data's typically highly skewed item frequency distribution. A user-specified
precision threshold is used together with the model to find local frequency
thresholds for groups of itemsets. Based on the constraint we develop the
notion of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly available
transaction databases we show that the new constraint provides improvements
over a single minimum support threshold and that the precision threshold is
more robust and easier to set and interpret by the user
Mining Frequent Itemsets Using Genetic Algorithm
In general frequent itemsets are generated from large data sets by applying
association rule mining algorithms like Apriori, Partition, Pincer-Search,
Incremental, Border algorithm etc., which take too much computer time to
compute all the frequent itemsets. By using Genetic Algorithm (GA) we can
improve the scenario. The major advantage of using GA in the discovery of
frequent itemsets is that they perform global search and its time complexity is
less compared to other algorithms as the genetic algorithm is based on the
greedy approach. The main aim of this paper is to find all the frequent
itemsets from given data sets using genetic algorithm
- …