79,334 research outputs found

    Mining Rooted Ordered Trees under Subtree Homeomorphism

    Full text link
    Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant.Comment: This paper is accepted in the Data Mining and Knowledge Discovery journal (http://www.springer.com/computer/database+management+%26+information+retrieval/journal/10618

    A numerical method for frequent pattern mining

    Get PDF
    Frequent pattern mining is one of the active research themes in data mining. It plays an important role in all data mining tasks such as clustering, classification, prediction, and association analysis. Identifying all frequent patterns is the most time consuming process due to a massive number of patterns generated. A reasonable solution is identifying maximal frequent patterns which form the smallest representative set of patterns to generate all frequent patterns. In this paper, an efficient numerical method for mining frequent patterns is proposed. This method is based on prime number characteristics to generate all frequent patterns by using maximal frequent ones. There are two new properties introduced in this method; a novel tree structure called PC_Tree and PC_Miner algorithm. The PC_Tree is a simple tree structure but yet capable to capture the whole of transactions information with an efficient data transformation technique that utilizes the prime number theory. The PC_Miner algorithm traverses the PC_Tree by using an efficient pruning technique. The experimental results verify the compactness and the efficiency of mining shown by the proposed method

    Compact structure representation in discovering frequent patterns for association rules

    Get PDF
    Frequent pattern mining is a key problem in important data mining applications, such as the discovery of association rules, strong rules and episodes. Structure used in typical algorithms for solving this problem operate in several database scans and a large number of candidate generation. This paper presents a compact structure representation called Flex-tree in discovering frequent patterns for association rules. Flex-tree structure is a lexicographic tree which finds frequent patterns by using depth first search strategy. Efficiency of mining is achieved with one scan of database instead of repeated database passes done in other methods and avoid the costly generation of large numbers of candidate sets, which dramatically reduces the search space

    iWAP: ASingle Pass Approach for Web Access Sequential Pattern Mining

    Get PDF
    With the explosive growth of data availability on the World Wide Web, web usage mining becomes very essential for improving designs of websites, analyzing system performance as well as network communications, understanding user reaction, motivation and building adaptive websites. Web Access Pattern mining (WAP-mine) is a sequential pattern mining technique for discovering frequent web log access sequences. It first stores the frequent part of original web access sequence database on a prefix tree called WAP-tree and mines the frequent sequences from that tree according to a user given minimum support threshold. Therefore, this method is not applicable for incremental and interactive mining. In this paper, we propose an algorithm, improved Web Access Pattern (iWAP) mining, to find web access patterns from web logs more efficiently than the WAP-mine algorithm. Our proposed approach can discover all web access sequential patterns with a single pass of web log databases. Moreover, it is applicable for interactive and incremental mining which are not provided by the earlier one. The experimental and performance studies show that the proposed algorithm is in general an order of magnitude faster than the existing WAP-mine algorithm

    Compact structure representation in discovering frequent patterns for association rules

    Get PDF
    Frequent pattern mining is a key problem in important data mining applications, such as the discovery of association rules, strong rules and episodes. Structure used in typical algorithms for solving this problem operate in several database scans and a large number of candidate generation. This paper presents a compact structure representation called Flex-tree in discovering frequent patterns for association rules. Flex-tree structure is a lexicographic tree which finds frequent patterns by using depth first search strategy. Efficiency of mining is achieved with one scan of database instead of repeated database passes done in other methods and avoid the costly generation of large numbers of candidate sets, which dramatically reduces the search space

    An algorithm for fast mining top-rank-k frequent patterns based on node-list data structure

    Get PDF
    Frequent pattern mining usually requires much run time and memory usage. In some applications, only the patterns with top frequency rank are needed. Because of the limited pattern numbers, quality of the results is even more important than time and memory consumption. A Frequent Pattern algorithm for mining Top-rank-K patterns, FP_TopK, is proposed. It is based on a Node-list data structure extracted from FTPP-tree. Each node is with one or more triple sets, which contain supports, preorder and post-order transversal orders for candidate pattern generation and top-rank-k frequent pattern mining. FP_TopK uses the minimal support threshold for pruning strategy to guarantee that each pattern in the top-rank-k table is really frequent and this further improves the efficiency. Experiments are conducted to compare FP_TopK with iNTK and BTK on four datasets. The results show that FP_TopK achieves better performance

    Frequent Item Set Mining In Data Mining: A survey

    Get PDF
    Data mining is process of extracting useful information from different perspectives. Frequent Item set mining is widely used in financial, retail and telecommunication industry. The major concern of these industries is faster processing of a very large amount of data. Frequent item sets are those items which are frequently occurred. So we can use different types of algorithms for this purpose. Frequent Itemset mining can be performed Apriori, FP-tree, Eclat, and RARM algorithms. For the work in this paper, we have analyzed widely used algorithms for finding frequent patterns with the purpose of discovering how these algorithms can be used to obtain frequent patterns over large transactional databases. This has been presented in the form of a comparative study of the following algorithms: Apriori, Frequent Pattern (FP) Growth, Rapid Association Rule Mining (RARM) and ECLAT algorithm frequent pattern mining algorithms. This study also focuses on each of the algorithm’s advantages, disadvantages and limitations for finding patterns among large item sets in database systems

    Unified Framework for Data Mining using Frequent Model Tree

    Get PDF
    Abstract: Data mining is the science of discovering hidden patterns from data. Over the past years, a plethora of data mining algorithms has been developed to carry out various data mining tasks such as classification, clustering, association mining and regression. All the methods are ad-hoc in nature, and there exists no unifying framework which unites all the data mining tasks. This study proposes such a framework which describes a data modelling technique to model data in a manner that can be used to accomplish all kinds of data mining tasks. This study proposed a novel algorithm known as Frequent Model (FM)-Growth, based on Frequent pattern (FP)-Growth algorithm. The algorithm is used to find frequent patterns or models from data. These models will then be used to carry out various data mining tasks such as classification, clustering. The advantage of these frequent models is that they can be used as it is with any data mining task irrespective of the nature of the task. The algorithm is carried out in two stages. In the first stage, we grow the FM-tree from the data and in the second stage, we extract the frequent models from the FM-tree. The accuracy of the proposed algorithm is high. However, the algorithm is computationally expensive when searching for frequent models in high volume and high dimensional data. The reason of expensiveness is that it needs to travel all the nodes of a tree. The study suggests measures to be taken to improve the efficiency of the overall process using dictionary data structure.Keywords: Data Mining, Frequent Pattern Recognition Unified Framework, Classification, Clustering, FPGrowth tree

    New approaches to weighted frequent pattern mining

    Get PDF
    Researchers have proposed frequent pattern mining algorithms that are more efficient than previous algorithms and generate fewer but more important patterns. Many techniques such as depth first/breadth first search, use of tree/other data structures, top down/bottom up traversal and vertical/horizontal formats for frequent pattern mining have been developed. Most frequent pattern mining algorithms use a support measure to prune the combinatorial search space. However, support-based pruning is not enough when taking into consideration the characteristics of real datasets. Additionally, after mining datasets to obtain the frequent patterns, there is no way to adjust the number of frequent patterns through user feedback, except for changing the minimum support. Alternative measures for mining frequent patterns have been suggested to address these issues. One of the main limitations of the traditional approach for mining frequent patterns is that all items are treated uniformly when, in reality, items have different importance. For this reason, weighted frequent pattern mining algorithms have been suggested that give different weights to items according to their significance. The main focus in weighted frequent pattern mining concerns satisfying the downward closure property. In this research, frequent pattern mining approaches with weight constraints are suggested. Our main approach is to push weight constraints into the pattern growth algorithm while maintaining the downward closure property. We develop WFIM (Weighted Frequent Itemset Mining with a weight range and a minimum weight), WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP (Weighted Interesting Pattern mining with a strong weight and/or support affinity), WSpan (Weighted Sequential pattern mining with a weight range and a minimum weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of support and/or weight affinity) The extensive performance analysis shows that suggested approaches are efficient and scalable in weighted frequent pattern mining
    corecore