7 research outputs found
Mining Rooted Ordered Trees under Subtree Homeomorphism
Mining frequent tree patterns has many applications in different areas such
as XML data, bioinformatics and World Wide Web. The crucial step in frequent
pattern mining is frequency counting, which involves a matching operator to
find occurrences (instances) of a tree pattern in a given collection of trees.
A widely used matching operator for tree-structured data is subtree
homeomorphism, where an edge in the tree pattern is mapped onto an
ancestor-descendant relationship in the given tree. Tree patterns that are
frequent under subtree homeomorphism are usually called embedded patterns. In
this paper, we present an efficient algorithm for subtree homeomorphism with
application to frequent pattern mining. We propose a compact data-structure,
called occ, which stores only information about the rightmost paths of
occurrences and hence can encode and represent several occurrences of a tree
pattern. We then define efficient join operations on the occ data-structure,
which help us count occurrences of tree patterns according to occurrences of
their proper subtrees. Based on the proposed subtree homeomorphism method, we
develop an effective pattern mining algorithm, called TPMiner. We evaluate the
efficiency of TPMiner on several real-world and synthetic datasets. Our
extensive experiments confirm that TPMiner always outperforms well-known
existing algorithms, and in several cases the improvement with respect to
existing algorithms is significant.Comment: This paper is accepted in the Data Mining and Knowledge Discovery
journal
(http://www.springer.com/computer/database+management+%26+information+retrieval/journal/10618