79,334 research outputs found
Mining Rooted Ordered Trees under Subtree Homeomorphism
Mining frequent tree patterns has many applications in different areas such
as XML data, bioinformatics and World Wide Web. The crucial step in frequent
pattern mining is frequency counting, which involves a matching operator to
find occurrences (instances) of a tree pattern in a given collection of trees.
A widely used matching operator for tree-structured data is subtree
homeomorphism, where an edge in the tree pattern is mapped onto an
ancestor-descendant relationship in the given tree. Tree patterns that are
frequent under subtree homeomorphism are usually called embedded patterns. In
this paper, we present an efficient algorithm for subtree homeomorphism with
application to frequent pattern mining. We propose a compact data-structure,
called occ, which stores only information about the rightmost paths of
occurrences and hence can encode and represent several occurrences of a tree
pattern. We then define efficient join operations on the occ data-structure,
which help us count occurrences of tree patterns according to occurrences of
their proper subtrees. Based on the proposed subtree homeomorphism method, we
develop an effective pattern mining algorithm, called TPMiner. We evaluate the
efficiency of TPMiner on several real-world and synthetic datasets. Our
extensive experiments confirm that TPMiner always outperforms well-known
existing algorithms, and in several cases the improvement with respect to
existing algorithms is significant.Comment: This paper is accepted in the Data Mining and Knowledge Discovery
journal
(http://www.springer.com/computer/database+management+%26+information+retrieval/journal/10618
A numerical method for frequent pattern mining
Frequent pattern mining is one of the active research themes in data mining. It plays an important role in all
data mining tasks such as clustering, classification, prediction, and association analysis. Identifying all
frequent patterns is the most time consuming process due to a massive number of patterns generated. A reasonable solution is identifying maximal frequent patterns which form the smallest representative set of patterns to generate all frequent patterns. In this paper, an efficient numerical method for mining frequent patterns is proposed. This method is based on prime number characteristics to generate all frequent patterns by using maximal frequent ones. There are two new properties introduced in this method; a novel tree structure called PC_Tree and PC_Miner algorithm. The PC_Tree is a simple tree structure but yet capable to capture the whole of transactions information with an efficient data transformation technique that utilizes the prime number theory. The PC_Miner algorithm traverses the PC_Tree by using an efficient pruning
technique. The experimental results verify the compactness and the efficiency of mining shown by the proposed method
Compact structure representation in discovering frequent patterns for association rules
Frequent pattern mining is a key problem in important data mining applications, such as the discovery of association rules, strong rules and episodes. Structure used in typical algorithms for solving this problem operate in several database scans and a large number of candidate generation. This paper presents a compact structure representation called Flex-tree in discovering frequent patterns for association rules. Flex-tree structure is a lexicographic tree which finds frequent patterns by using depth first search strategy. Efficiency of mining is achieved with one scan of database instead of repeated database passes done in other methods and avoid the costly generation of large numbers of candidate sets, which dramatically reduces the search space
iWAP: ASingle Pass Approach for Web Access Sequential Pattern Mining
With the explosive growth of data availability on the World Wide Web, web usage mining becomes very essential for improving designs of websites, analyzing system performance as well as network communications, understanding user reaction, motivation and building adaptive websites. Web Access Pattern mining (WAP-mine) is a sequential pattern mining technique for discovering frequent web log access sequences. It first stores the frequent part of original web access sequence database on a prefix tree called WAP-tree and mines the frequent sequences from that tree according to a user given minimum support threshold. Therefore, this method is not applicable for incremental and interactive mining. In this paper, we propose an algorithm, improved Web Access Pattern (iWAP) mining, to find web access patterns from web logs more efficiently than the WAP-mine algorithm. Our proposed approach can discover all web access sequential patterns with a single pass of web log databases. Moreover, it is applicable for interactive and incremental mining which are not provided by the earlier one. The experimental and performance studies show that the proposed algorithm is in general an order of magnitude faster than the existing WAP-mine algorithm
Compact structure representation in discovering frequent patterns for association rules
Frequent pattern mining is a key problem in important data mining applications, such as the discovery of association rules, strong rules and episodes. Structure used in typical algorithms for solving this problem operate in several database scans and a large number of candidate generation. This paper presents a compact structure representation called Flex-tree in discovering frequent patterns for association rules. Flex-tree structure is a lexicographic tree which finds frequent patterns by using depth first search strategy. Efficiency of mining is achieved with one scan of database instead of repeated database passes done in other methods and avoid the costly generation of large numbers of candidate sets, which dramatically reduces the search space
An algorithm for fast mining top-rank-k frequent patterns based on node-list data structure
Frequent pattern mining usually requires much run time and memory usage. In some applications, only the patterns with top frequency rank are needed. Because of the limited pattern numbers, quality of the results is even more important than time and memory consumption. A Frequent Pattern algorithm for mining Top-rank-K patterns, FP_TopK, is proposed. It is based on a Node-list data structure extracted from FTPP-tree. Each node is with one or more triple sets, which contain supports, preorder and post-order transversal orders for candidate pattern generation and top-rank-k frequent pattern mining. FP_TopK uses the minimal support threshold for pruning strategy to guarantee that each pattern in the top-rank-k table is really frequent and this further improves the efficiency. Experiments are conducted to compare FP_TopK with iNTK and BTK on four datasets. The results show that FP_TopK achieves better performance
Frequent Item Set Mining In Data Mining: A survey
Data mining is process of extracting useful information from different perspectives. Frequent Item set mining is widely used in financial, retail and telecommunication industry. The major concern of these industries is faster processing of a very large amount of data. Frequent item sets are those items which are frequently occurred. So we can use different types of algorithms for this purpose. Frequent Itemset mining can be performed Apriori, FP-tree, Eclat, and RARM algorithms. For the work in this paper, we have analyzed widely used algorithms for finding frequent patterns with the purpose of discovering how these algorithms can be used to obtain frequent patterns over large transactional databases. This has been presented in the form of a comparative study of the following algorithms: Apriori, Frequent Pattern (FP) Growth, Rapid Association Rule Mining (RARM) and ECLAT algorithm frequent pattern mining algorithms. This study also focuses on each of the algorithm’s advantages, disadvantages and limitations for finding patterns among large item sets in database systems
Unified Framework for Data Mining using Frequent Model Tree
Abstract: Data mining is the science of discovering hidden patterns from data. Over the past years, a plethora of data mining algorithms has been developed to carry out various data mining tasks such as classification, clustering, association mining and regression. All the methods are ad-hoc in nature, and there exists no unifying framework which unites all the data mining tasks. This study proposes such a framework which describes a data modelling technique to model data in a manner that can be used to accomplish all kinds of data mining tasks. This study proposed a novel algorithm known as Frequent Model (FM)-Growth, based on Frequent pattern (FP)-Growth algorithm. The algorithm is used to find frequent patterns or models from data. These models will then be used to carry out various data mining tasks such as classification, clustering. The advantage of these frequent models is that they can be used as it is with any data mining task irrespective of the nature of the task. The algorithm is carried out in two stages. In the first stage, we grow the FM-tree from the data and in the second stage, we extract the frequent models from the FM-tree. The accuracy of the proposed algorithm is high. However, the algorithm is computationally expensive when searching for frequent models in high volume and high dimensional data. The reason of expensiveness is that it needs to travel all the nodes of a tree. The study suggests measures to be taken to improve the efficiency of the overall process using dictionary data structure.Keywords: Data Mining, Frequent Pattern Recognition Unified Framework, Classification, Clustering, FPGrowth tree
New approaches to weighted frequent pattern mining
Researchers have proposed frequent pattern mining algorithms that are more
efficient than previous algorithms and generate fewer but more important patterns. Many
techniques such as depth first/breadth first search, use of tree/other data structures, top
down/bottom up traversal and vertical/horizontal formats for frequent pattern mining
have been developed. Most frequent pattern mining algorithms use a support measure to
prune the combinatorial search space. However, support-based pruning is not enough
when taking into consideration the characteristics of real datasets. Additionally, after
mining datasets to obtain the frequent patterns, there is no way to adjust the number of
frequent patterns through user feedback, except for changing the minimum support.
Alternative measures for mining frequent patterns have been suggested to address these
issues. One of the main limitations of the traditional approach for mining frequent
patterns is that all items are treated uniformly when, in reality, items have different
importance. For this reason, weighted frequent pattern mining algorithms have been
suggested that give different weights to items according to their significance. The main
focus in weighted frequent pattern mining concerns satisfying the downward closure
property. In this research, frequent pattern mining approaches with weight constraints are
suggested. Our main approach is to push weight constraints into the pattern growth
algorithm while maintaining the downward closure property. We develop WFIM
(Weighted Frequent Itemset Mining with a weight range and a minimum weight),
WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP
(Weighted Interesting Pattern mining with a strong weight and/or support affinity),
WSpan (Weighted Sequential pattern mining with a weight range and a minimum
weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of
support and/or weight affinity)
The extensive performance analysis shows that suggested approaches are
efficient and scalable in weighted frequent pattern mining
- …