4 research outputs found

    Geometrical Structure and Analysis of Association Rules

    Get PDF
    Association rule mining helps us to identify the association between items from a large transactional data set. It has always been a time consuming process because of repeatedly scanning of the data set. Apriori Algorithm [1] and FP-Tree Algorithm [2] are the two methods to find out the association of items in a large transactional item set. Both the above algorithm works differently (Apriori follows Bottom-Up Approach & FP-Tree follows Top-Down Approach) in order to get the association. Associations of items generated from the above two algorithms can be represented in geometry. The geometrical form of associations is called Simplical Complex. By exploring the FP-Tree method and using the bit pattern of records we can quickly indentify the possible longest associations on transactional data. The proposed Bitmap method in using the FP-Tree method is a new approach which helps quickly by using Human-Aide to find out the longest association. This method quickly finds out the longest association of items by looking the bit pattern of items in a large item set. By aligning the similar bits of records and arranging the attributes in order of highest frequency first are the underline logics of the above algorithm to get the longest association of items

    Data Mining and Data Warehouse ------ Maximal Simplex Method

    Get PDF
    Association Rule Mining is a widely used method for finding interesting relationships from large data sets. The challenge here is how to swiftly and accurately discover association rules from large data sets. To achieve this, this paper will (1) build a data warehouse system that simulates the secondary storage and represents a database by bit patterns, and (2) implement a new geometric algorithm to find association rules, called Maximal Simplex Algorithm. The data warehouse consists of very long bit columns. Each column is an item or an attribute value pair and a row represents a transaction or a tuple in a database. A bit value 1 in a row represents the transaction contain this item or the tuple contains this value. In this Maximal Simplex Algorithm, we interpret the set of bit columns as a set of independent vertices in a high dimension Euclidean space. The main idea is for each vertex, we find its star neighborhood, namely to find all simplexes that contains this vertex. An n-dimensional simplex is called n-simplex. An n-simplex represents the association rule of length n+1. Based on the experimental results, Maximal Simplex method improves the performance of association rule mining. And also it is possible to achieve parallel computing by using the data warehouse system

    Association Rule Mining -- Geometry and Parallel Computing Approach

    Get PDF
    Mining association rules is a very important aspect in data mining fields. The process to mine association rules not only take much time, but also take huge computing source. How to fast and efficiently find the large itemsets is a crucial point in the association rule algorithms. This paper will focus on two algorithms research and implementation in parallel computing environments. One is Bitmap Combination algorithm, the other is Bitmap FP-Growth algorithm. Compared to Apriori algorithm, both Bitmap Combination and Bitmap FP-Growth algorithms don’t need generate candidate items, avoids costly database scans. Both algorithms need to translate the original database to Bitmap format, analyze bit distribution to reduce database size and apply high-speed bit calculation to improve the algorithms. The divide-and-conquer replace generation-and-test idea as the basic strategy. Bitmap Combination Algorithm shows the quick combination skills between any two, three, four and more rows, then screening the qualified itemsets. Bitmap FP-Growth Algorithm apply special bit calculation to recursively mine association rules. Based on the experimental results in this paper, both algorithms greatly improve the efficiency and performance of mining association rules, especially provide the possibility to mine association rules in highly parallel computing environments
    corecore