30 research outputs found

    Matching in frequent tree discovery

    No full text
    Various definitions and frameworks for discovering frequent trees in forests have been developed recently. At the heart of these frameworks lies the notion of matching, which determines when a pattern tree matches a tree in a data set. We introduce a novel notion of tree matching for use in frequent tree mining and we show that it generalizes the framework of Zaki while still being more specific than that of Termier et al. Furthermore, we show how Zaki’s TreeMinerV algorithm can be adapted towards our notion of tree matching. Experiments show the promise of the approach. 1

    Tree 2 - decision trees for tree structured data

    No full text
    Abstract. We present Tree 2, a new approach to structural classification. This integrated approach induces decision trees that test for pattern occurrence in the inner nodes. It combines state-of-the-art tree mining with sophisticated pruning techniques to find the most discriminative pattern in each node. In contrast to existing methods, Tree 2 uses no heuristics and only a single, statistically well founded parameter has to be chosen by the user. The experiments show that Tree 2 classifiers achieve good accuracies while the induced models are smaller than those of existing approaches, facilitating better comprehensibility.

    One in a million: Picking the right patterns

    No full text
    Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique couldmake use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms— Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.status: publishe

    Aggregated subset mining

    No full text
    The usual data mining setting uses the full amount of data to derive patterns for different purposes. Taking cues from machine learning techniques, we explore ways to divide the data into subsets, mine patterns on them and use post-processing techniques for acquiring the result set. Using the patterns as features for a classification task to evaluate their quality, we compare the different subset compositions, and selection techniques. The two main results -- that small independent sets are better suited than large amounts of data, and that uninformed selection techniques perform well -- can to a certain degree be explained by quantitative characteristics of the derived pattern sets.acceptance rate = 33%status: publishe
    corecore