44,266 research outputs found

    New probabilistic interest measures for association rules

    Full text link
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

    Mining of Frequent OptimisticEstimations by Using Measured Techniques

    Get PDF
    Abstract In recent years the sizes of databases has increased rapidly. This has led toa growing interest in the development of tools capable in the automatic extractionof knowledge from data. The term Data Mining, or Knowledge Discovery inDatabases, has been adopted for a field of research dealing with the automaticdiscovery of implicit information or knowledge within databases.Several efficient algorithms have been proposed for finding frequentitemsets and the association rules are derived from the frequent itemsets, such as theApriori algorithm. These Apriori-like algorithms suffer from the coststo handle a huge number of candidate sets and scan the database repeatedly. A frequent pattern tree (FP-tree) structure for storing compressed and criticalinformation about frequent patterns is developed for finding the complete set of frequent itemsets. But this approachavoids the costly generation of a large number of candidate sets and repeated databasescans, which is regarded as the most efficient strategy for mining frequent itemsets.Finding of infrequent items gives the positive feed back to the Production Manager. In this paper, we are finding frequent and infrequent itemsets by taking opinions of different customers by using Dissimilarity Matrix between frequent and infrequent items and also by using Binary Variable technique. We also exclusively use AND Gate Logic function for finding opinions of frequent and infrequent items. After finding frequent and infrequent items the apply Classification Based on Associations (CBA) on them to have better classification

    A Model-Based Frequency Constraint for Mining Associations from Transaction Data

    Full text link
    Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with the model to find local frequency thresholds for groups of itemsets. Based on the constraint we develop the notion of NB-frequent itemsets and adapt a mining algorithm to find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user

    FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases

    Full text link
    In recent years, discovery of association rules among itemsets in a large database has been described as an important database-mining problem. The problem of discovering association rules has received considerable research attention and several algorithms for mining frequent itemsets have been developed. Many algorithms have been proposed to discover rules at single concept level. However, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. The discovery of multiple level association rules is very much useful in many applications. In most of the studies for multiple level association rule mining, the database is scanned repeatedly which affects the efficiency of mining process. In this research paper, a new method for discovering multilevel association rules is proposed. It is based on FP-tree structure and uses cooccurrence frequent item tree to find frequent items in multilevel concept hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis
    • …
    corecore