44,266 research outputs found
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful
patterns in transaction databases. Many different measures of interestingness
have been proposed for association rules. However, these measures fail to take
the probabilistic properties of the mined data into account. In this paper, we
start with presenting a simple probabilistic framework for transaction data
which can be used to simulate transaction data when no associations are
present. We use such data and a real-world database from a grocery outlet to
explore the behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically influenced
by the frequency of the items in the left hand side of rules and that lift
performs poorly to filter random noise in transaction data. Based on the
probabilistic framework we develop two new interest measures, hyper-lift and
hyper-confidence, which can be used to filter or order mined association rules.
The new measures show significantly better performance than lift for
applications where spurious rules are problematic
Mining of Frequent OptimisticEstimations by Using Measured Techniques
Abstract In recent years the sizes of databases has increased rapidly. This has led toa growing interest in the development of tools capable in the automatic extractionof knowledge from data. The term Data Mining, or Knowledge Discovery inDatabases, has been adopted for a field of research dealing with the automaticdiscovery of implicit information or knowledge within databases.Several efficient algorithms have been proposed for finding frequentitemsets and the association rules are derived from the frequent itemsets, such as theApriori algorithm. These Apriori-like algorithms suffer from the coststo handle a huge number of candidate sets and scan the database repeatedly. A frequent pattern tree (FP-tree) structure for storing compressed and criticalinformation about frequent patterns is developed for finding the complete set of frequent itemsets. But this approachavoids the costly generation of a large number of candidate sets and repeated databasescans, which is regarded as the most efficient strategy for mining frequent itemsets.Finding of infrequent items gives the positive feed back to the Production Manager. In this paper, we are finding frequent and infrequent itemsets by taking opinions of different customers by using Dissimilarity Matrix between frequent and infrequent items and also by using Binary Variable technique. We also exclusively use AND Gate Logic function for finding opinions of frequent and infrequent items. After finding frequent and infrequent items the apply Classification Based on Associations (CBA) on them to have better classification
A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Mining frequent itemsets is a popular method for finding associated items in
databases. For this method, support, the co-occurrence frequency of the items
which form an association, is used as the primary indicator of the
associations's significance. A single user-specified support threshold is used
to decided if associations should be further investigated. Support has some
known problems with rare items, favors shorter itemsets and sometimes produces
misleading associations.
In this paper we develop a novel model-based frequency constraint as an
alternative to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by applying a
simple stochastic mixture model (the NB model) which allows for transaction
data's typically highly skewed item frequency distribution. A user-specified
precision threshold is used together with the model to find local frequency
thresholds for groups of itemsets. Based on the constraint we develop the
notion of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly available
transaction databases we show that the new constraint provides improvements
over a single minimum support threshold and that the precision threshold is
more robust and easier to set and interpret by the user
FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases
In recent years, discovery of association rules among itemsets in a large
database has been described as an important database-mining problem. The
problem of discovering association rules has received considerable research
attention and several algorithms for mining frequent itemsets have been
developed. Many algorithms have been proposed to discover rules at single
concept level. However, mining association rules at multiple concept levels may
lead to the discovery of more specific and concrete knowledge from data. The
discovery of multiple level association rules is very much useful in many
applications. In most of the studies for multiple level association rule
mining, the database is scanned repeatedly which affects the efficiency of
mining process. In this research paper, a new method for discovering multilevel
association rules is proposed. It is based on FP-tree structure and uses
cooccurrence frequent item tree to find frequent items in multilevel concept
hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
- …