Search CORE

30 research outputs found

Matching in frequent tree discovery

Author: Björn Bringmann
Publication venue
Publication date: 01/01/2004
Field of study

Various definitions and frameworks for discovering frequent trees in forests have been developed recently. At the heart of these frameworks lies the notion of matching, which determines when a pattern tree matches a tree in a data set. We introduce a novel notion of tree matching for use in frequent tree mining and we show that it generalizes the framework of Zaki while still being more specific than that of Termier et al. Furthermore, we show how Zaki’s TreeMinerV algorithm can be adapted towards our notion of tree matching. Experiments show the promise of the approach. 1

Lirias

CiteSeerX

Tree 2 - decision trees for tree structured data

Author: Albrecht Zimmermann
Björn Bringmann
Publication venue: Springer
Publication date: 01/01/2005
Field of study

Abstract. We present Tree 2, a new approach to structural classification. This integrated approach induces decision trees that test for pattern occurrence in the inner nodes. It combines state-of-the-art tree mining with sophisticated pruning techniques to find the most discriminative pattern in each node. In contrast to existing methods, Tree 2 uses no heuristics and only a single, statistically well founded parameter has to be chosen by the user. The experiments show that Tree 2 classifiers achieve good accuracies while the induced models are smaller than those of existing approaches, facilitating better comprehensibility.

CiteSeerX

One in a million: Picking the right patterns

Author: Bringmann Björn
Zimmermann Albrecht
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2008
Field of study

Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique couldmake use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms— Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.status: publishe

Lirias

Aggregated subset mining

Author: Bringmann Björn
Zimmermann Albrecht
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2009
Field of study

The usual data mining setting uses the full amount of data to derive patterns for different purposes. Taking cues from machine learning techniques, we explore ways to divide the data into subsets, mine patterns on them and use post-processing techniques for acquiring the result set. Using the patterns as features for a classification task to evaluate their quality, we compare the different subset compositions, and selection techniques. The two main results -- that small independent sets are better suited than large amounts of data, and that uninformed selection techniques perform well -- can to a certain degree be explained by quantitative characteristics of the derived pattern sets.acceptance rate = 33%status: publishe

Lirias